Date: 2022-07-01
Date of Incident: 2022-06-30
Description: RCA for Console(s) and API service degradation
Summary:
At approximately 12:25 MDT on 2022-06-30, some JumpCloud customers experienced failures loading device information in the Admin Portal. Additionally at this time, some users may have experienced intermittent failures launching SSO applications. This degradation of service lasted until approximately 12:50 MDT on 2022-06-30.
Root Cause:
A code deploy was made to production causing an increase in calls to corresponding services for that version. This increase exceeded the expected rate causing contention, and increased latency for these services. The canary for this deploy did pass while under a lesser load in production, and was promoted correctly based on the returned data. However, the deploy ramp was too steep for this version, and exceeded our error rate before the deploy could be rolled back.
Corrective Actions / Risk Mitigation: