Active Directory Integration Connection Failures

Incident Report for JumpCloud

Postmortem

Date: Apr 23, 2025

Date of Incident: Apr 18, 2025

Description: RCA for ADI Outage

Summary:

On April 18th at 1:27 PM MT JumpCloud experienced an issue affecting our Active Directory Integration (ADI) agents. Some customers using ADI sync and Delegated Auth experienced connectivity issues, resulting in failed directory synchronizations and authentication attempts. New ADI installs would have failed during the incident window, and ADI agents attempting to restart or reconnect during the incident window would have failed.

Root Cause:

The incident was caused by the expiration of a certificate used for mutual TLS (mTLS) authentication between ADI agents and our infrastructure. This certificate is essential for securing the communication channel between customer environments and JumpCloud services.

Why wasn’t this caught before the certificate expired?

In mid-2024 we migrated our global proxy infrastructure from EC2 into Kubernetes. During this transition, monitoring for the proxies that handle mTLS were mistakenly deleted and not recreated on the new infrastructure.

Why did it take so long to recover?

There are two reasons recovery took longer than it should have. The first reason is the missing monitoring above meant this went unnoticed by our operations teams once the cert expired. The second is our teams did not see any instability or outage signals in our internal telemetry, which led them to believe this was customer environment specific. This was because existing connections were un-impacted.

Corrective Actions / Risk Mitigation:

  1. Enhancing our certificate monitoring systems to provide earlier notifications of upcoming expirations - DONE
  2. Improving our internal escalation procedures to reduce time-to-resolution - DONE
  3. Implement additional automated testing to detect potential authentication issues - PLANNING
  4. Creating more comprehensive documentation for critical operational procedures - WIP
Posted Apr 23, 2025 - 16:16 MDT

Resolved

This issue has been resolved and service has now returned to normal.
Posted Apr 19, 2025 - 20:51 MDT

Update

We are currently in the process of implementing a fix for this issue.
Posted Apr 19, 2025 - 20:12 MDT

Identified

We have identified an issue affecting the Active Directory Integration Sync and Import services, including Active Directory Delegated Authentication.
This issue affects users, user updates, and password changes are not syncing between JumpCloud and Active directory, as well as delegated authentication to Active Directory.
In the meantime, if your users are unable to authenticate because you have delegated authentication enabled for your users, please review the following article for steps on disabling delegated authentication.

https://jumpcloud.com/support/adi-use-ad-delegated-authentication#disabling-delegated-authentication-for-users
Posted Apr 19, 2025 - 18:10 MDT

Investigating

The Active Directory Integration Service was found encountering errors and we are investigating the issue.
Posted Apr 19, 2025 - 17:42 MDT
This incident affected: Active Directory Integration.