Degraded Console Service
Incident Report for JumpCloud
Postmortem

JumpCloud Incident Report

Date: 2022-05-31

Date of Incident: 2022-05-23

Description: RCA for JumpCloud Service Interruption

Summary:

At approximately 20:45 MDT on 2022-05-23, JumpCloud customers experienced the inability to access JumpCloud’s Admin and User consoles.  This loss of access also affected JumpCloud’s API, and lasted until approximately 21:05 MDT on 2022-05-23.

Root Cause:

The incident was caused by a shared code base component getting released to production inadvertently.  This branch of code passed testing, and deployed to production earlier in the day.  It immediately failed, and rolled back without issue.  Upon rollback, there was a failure in changing the version state to “not approved”, and the deploy mechanism viewed this code in the test environment as “passed”.  During unrelated maintenance this version was again released to production due to the “passed” value recognized by the deploy mechanism.  This deployment missed the canary gate which did not rollback effectively, and required a manual rollback.

Corrective Actions / Risk Mitigation:

Production deployments are temporarily paused until we have required changes and coverage in place

  1. Immediate rollback of the change - DONE
  2. Additional testing for missed component - Target 2022.06.07
  3. Add additional alerting around this service component(s) - Target 2022.06.20
  4. Better coverage with our canary process. - Target 2022.06.11
Posted May 31, 2022 - 16:11 MDT

Resolved
This incident has been resolved.
Posted May 23, 2022 - 21:51 MDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 23, 2022 - 21:14 MDT
Identified
The issue has been identified and a fix is being implemented.
Posted May 23, 2022 - 21:09 MDT
Investigating
We are currently investigating timeouts and failures to load the Admin and User consoles. We will report back as soon as we have an update. We apologize for any inconvenience this may cause
Posted May 23, 2022 - 20:56 MDT
This incident affected: Admin Console and User Console.