Increased Error Rates Affecting Multiple Platform Services

Incident Report for JumpCloud

Postmortem

Date: Nov 7, 2025

Date of Incident: Nov 4, 2025

Description: RCA for Auth Database Degradation

Summary:

On November 4, 2025, a number of customers experienced intermittent failures, timeouts and increased latency when attempting to authenticate to multiple JumpCloud Services, including consoles, LDAP, RADIUS and SAML, or use Multi-Factor Authentication.

Root Cause:

The incident was triggered by an issue in the deployment process involving a database schema change and a subsequent application code release.

During this deployment, a planned database change unintentionally removed several database indexes required by the existing application code.

The sequence of failure was as follows:

  1. Deployment Order Error: The database schema change (which removed necessary indexes) was applied to the production database before the new application code (which did not require those indexes) was deployed.
  2. Performance Collapse: The existing, high-volume authentication code (used for functions like TOTP and push authentication) was forced to run against the now-inefficient database structure. Queries that normally took milliseconds suddenly took several seconds.
  3. Connection Exhaustion: These slow queries held database connections open for extended periods, quickly overwhelming the database server's available connection pool.
  4. Full Outage: With no available connections, the main authentication API could not communicate with the database, leading to 100% CPU utilization on the database server and triggering the intermittent timeouts and failures experienced by our customers.

Why Testing Did Not Catch This:

The issue was not identified during testing in our Development or Staging environments due to insufficient Load Simulation. The resource consumption issues and connection exhaustion only manifest under the extreme pressure of peak production traffic volume. The simulated load profiles in our lower environments were not sufficient to expose this specific failure mode.

Corrective Actions / Risk Mitigation:

  1. Mandatory schema change review - All database schema changes must now undergo an additional level of review to explicitly assess index dependencies and impact.
  2. New deployment phasing - We are implementing new tools and checks to enforce that application code dependent on a schema change is deployed before a database change is executed.
  3. Enhance alerting -  We are implementing new monitors and alerts specifically for the Auth-API's database connection pool health and CPU utilization.
  4. Enhanced load testing - We are revisiting the load profiles used in our staging environments looking for opportunities to more accurately simulate peak production traffic.
Posted Nov 07, 2025 - 10:48 MST

Resolved

Services have been fully restored and this incident has been resolved. We will provide a formal postmortem as a follow up.
Posted Nov 04, 2025 - 07:19 MST

Monitoring

We have implemented a fix and users should now be able to access the User and Admin Console, MFA, LDAP, RADIUS, and SSO without issue.  We will continue to monitor the results of the fix.
Posted Nov 04, 2025 - 06:17 MST

Update

We continue to see intermittent issues with accessing the JumpCloud User and Admin Portal, MFA, LDAP, RADIUS, and SSO. During this time access attempts to LDAP and RADIUS are only impacted if a user is authenticating with MFA.

Our team is working on implementing a fix and will provide another update as quickly as possible.
Posted Nov 04, 2025 - 05:59 MST

Identified

We have identified an issue that is causing intermittent login issues with the JumpCloud User Portal and Admin Portal. We have also identified issues with JumpCloud MFA, LDAP, RADIUS, and authentication with SSO. We are working on implementing a fix and will provide another update as soon as possible.
Posted Nov 04, 2025 - 05:20 MST

Update

We are continuing to investigate this issue.
Posted Nov 04, 2025 - 04:47 MST

Update

We are continuing to investigate this issue.
Posted Nov 04, 2025 - 04:41 MST

Investigating

We are seeing issues with SSO authentication. We are investigating this currently and will update within 1 hour
Posted Nov 04, 2025 - 04:40 MST
This incident affected: LDAP (LDAP), RADIUS (RADIUS), User Console (User Console), TOTP / MFA / JumpCloud Protect (TOTP / MFA / JumpCloud Protect), SSO (SSO), and Admin Console (Admin Console).