Sign in errors for some users
Incident Report for Fountain
Postmortem

On the morning of December 6th, 2021, we deployed a new version of Fountain that included a major upgrade of our main application framework (Ruby on Rails). We immediately began receiving reports that some customers were unable to access our hiring dashboards. Our engineers started investigating.

At Fountain, we understand that we are privileged to be such a large part of your workflows and hiring process.  We take great pride in the critical role that we play, and we understand the impact of any issues using Fountain have on your work.

By mid-day we had tracked the access issues back to a corrupted cookie that happened during our Rails upgrade.  We anticipated the change in cookie format and took steps to address it during our normal deployment.  Unfortunately, one scenario we failed to cover left some of our active users stuck in between the two formats. Manually removing their cookie was the only resolution. In the future, when we know a deployment has the potential to invalidate user cookies, we will address this case before deployment to ensure a seamless transition.

Additionally, after our deployment completed, we noticed a second issue: webhooks stopped sending updates from Fountain to external systems. This was due to a second change in data format with the Rails Upgrade, this time changing the format of a query string. We immediately issued a fix and began replaying the missed hooks. By 2am PST on December 7th all systems were again operational. In investigating this issue, we noticed gaps in test automation and monitoring that we are addressing to ensure this doesn’t happen again.

Finally, a word about our communication in these situations.  Our engineering staff was engaged from the moment that we deployed, and worked to get these issues resolved as efficiently as possible. We fell short in our commitment to communicate immediately to our customers, however—a breach of trust.  We are working to improve our communication process of incidents like this to ensure that no-one is left in the dark.

We appreciate the faith and trust that you’ve put in our product and our team.  Our teams are working to improve our product and processes going forward.

Posted Dec 09, 2021 - 11:38 PST

Resolved
This incident has been resolved. The webhooks will be replayed in short order.
Posted Dec 06, 2021 - 17:48 PST
Update
Webhooks are working as expected after our fix.
Posted Dec 06, 2021 - 17:24 PST
Update
We are seeing issues with webhooks processing that is related to the current issue as well. We are pushing out a fix for this. We will be replaying the webhooks as soon as the issue is fixed.
Posted Dec 06, 2021 - 16:38 PST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Dec 06, 2021 - 12:54 PST
Investigating
We are currently investigating this issue. Clearing the browser cookies and trying again may resolve this issues for some users.
Posted Dec 06, 2021 - 12:07 PST
This incident affected: Core (Dashboard, Webhooks).