Intermittent degradation with SSO and loading device information
Incident Report for JumpCloud
Postmortem

JumpCloud Incident Report

Date: 2022-07-01

Date of Incident: 2022-06-30

Description: RCA for Console(s) and API service degradation

Summary:

At approximately 12:25 MDT on 2022-06-30, some JumpCloud customers experienced failures loading device information in the Admin Portal.  Additionally at this time, some users may have experienced intermittent failures launching SSO applications.   This degradation of service lasted until approximately 12:50 MDT on 2022-06-30.

Root Cause:

A code deploy was made to production causing an increase in calls to corresponding services for that version.  This increase exceeded the expected rate causing contention, and increased latency for these services. The canary for this deploy did pass while under a lesser load in production, and was promoted correctly based on the returned data.  However, the deploy ramp was too steep for this version, and exceeded our error rate before the deploy could be rolled back.

Corrective Actions / Risk Mitigation:

  • Roll back failed release - DONE
  • Evaluate and modify canary promotion rate - Target 07/2022
  • Increased system protections on internal limits - Target 08/2022
Posted Jul 01, 2022 - 13:24 MDT

Resolved
This incident has been resolved.
Posted Jun 30, 2022 - 13:05 MDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jun 30, 2022 - 13:00 MDT
Identified
We've identified the issue and are implementing a fix
Posted Jun 30, 2022 - 12:53 MDT
Investigating
We are currently experiencing issues with SSO and Device Information in the Admin Portal. We will report back as soon as we have an update. We apologize for any inconvenience this may cause.
Posted Jun 30, 2022 - 12:41 MDT
This incident affected: Admin Console.