Increased error rates with some calls to API endpoints
Incident Report for JumpCloud
Postmortem

JumpCloud Incident Report

Date: 2022-06-27

Date of Incident: 2022-06-22

Description: RCA for intermittent increased error rates returned on some API endpoints

Summary:

At approximately 17:15 MDT on 2022-06-22, some JumpCloud customers experienced a decrease in success rates returned from some API endpoints.  This decrease lasted until approximately 20:00 MDT on 2022-06-22.

Root Cause:

A unique race condition caused a failure in our secrets engine, which in turn prohibited some services from acquiring the necessary credentials that would have allowed them to start successfully.  These services were not automatically removed from service due to a gap in health checks.

Corrective Actions / Risk Mitigation:

  1. Remove failed systems from service- DONE
  2. Add additional health checks for these services - DONE
  3. Increased logging around our secrets engine - Target 07/2022
Posted Jun 27, 2022 - 20:51 MDT

Resolved
This incident has been resolved.
Posted Jun 23, 2022 - 00:26 MDT
Update
We've made some changes to return success rates back to desired levels. We are continuing to test some additional changes to increase stability as well.
Posted Jun 22, 2022 - 23:08 MDT
Identified
The issue has been identified and a fix is being implemented.
Posted Jun 22, 2022 - 21:52 MDT
Investigating
We are currently seeing an increase in error rates to some API endpoints. We are actively investigating and will report back as soon as we have any new information.
Posted Jun 22, 2022 - 21:02 MDT
This incident affected: General Access API.