JumpCloud Agent / MDM Enrollments
Incident Report for JumpCloud
Postmortem

Date: Jan 17, 2025

Date of Incident: Jan 16, 2025

Description: RCA for Apple MDM Un-enrollment

‌To start, we’d like to acknowledge and apologize for the impact this incident had. We pride ourselves in operating with excellence and have increased efforts to minimize impact when things go wrong. We missed the mark here.

Summary:

On January 16th at 11:38 AM MT JumpCloud deployed a new version of Apple MDM.  At 11:47 AM JumpCloud detected macOS devices un-enrolling from Apple MDM and an investigation began.  At 12:03 PM a formal incident was declared and our incident management team came online to coordinate multiple teams in recovery efforts.  At 12:14 PM MT a feature flag was disabled, stopping the queuing of any further un-enrollments (more on this later).  At 12:15, the new code was completely rolled back, and at 12:32 JumpCloud deleted the command queue containing any further un-enrollment directives.  At this point, the teams worked to find the quickest solutions to provide customers for safely re-enrolling systems, and a comms plan for affected customers making them aware of the issue.

Root Cause:

First, let's start with some context on how JumpCloud uses feature flags. Our practice is to deliver code to production through small changes. This reduces the risks inherent with large changes and allows us to quickly identify what specific change could be responsible for erroneous behavior.  Feature flags are essentially if-statements in the code that determine which path to follow and execute.  When a flag is “on” new code is executed, and when the flag is “off” the code is skipped.  Our teams use these often with our deployments, providing the ability to turn features on or off based on certain attributes (like organization id) without modifying the source code.  This is where things went wrong.

The actual change deployed was in an effort to introduce Apple Declarative Device Management (DDM) support, which lets devices apply configurations independently based on certain criteria.  For an existing device we match on a number of factors, one of which being the unique device identifier (UDID) generated at enrollment. This is a security measure to prevent impersonation or a fake device.  Unfortunately, with this new API the UDID did not match for some macOS devices, causing device un-enrollment.  This was not caught in our pre-production environments because the intended state for the feature flag controlling this change was off. In the pre-production environments our testing passed because this code path was not active.

Why was the feature flag on?  The teams use a rollout status with each flag, along with other identifiers for that code block.  What we missed was a validation step ensuring the feature flag was in the expected state before releasing the code to production.

What are you doing to ensure this doesn’t happen again?  With every incident, we perform a thorough post incident review to address gaps in many areas, including process and testing.  We take these very seriously and discuss all incidents on a bi-weekly cadence with the entire engineering organization.  These reports then get rolled up to our executive staff.  This incident clearly exposed a gap in validation and we have changes in flight immediately to address that.  We’ve also stopped any future deployments until this gap (and any others we find with this investigation) is closed and approved by our SRE team.

Corrective Actions / Risk Mitigation:

  1. Flip the feature flag to off for this code block, and revert the code - DONE
  2. Stop production deploys for this code base  - DONE
  3. Review and harden the validation steps for feature flags - IN PROGRESS
  4. Modify our automation to test code with feature flags in both positions - IN PROGRESS
Posted 25 days ago. Jan 17, 2025 - 15:13 MST

Resolved
This incident has been resolved. To enroll devices again, customers can follow the below steps:

ADE-enrolled devices
If you have devices configured for ADE there are two options to restart the ADE enrollment;
Administrator Privileges Required* see: https://jumpcloud.com/support/set-admin-sudo-permissions
Setup a command to execute `sudo profiles renew -type enrollment`
Or
Have your users execute `sudo profiles renew -type enrollment` in a terminal window
In both scenarios the user will be prompted to complete the re-enrollment of the device. This action will require the user to have administrator privileges on the device, at that time.

Non-ADE enrolled devices
For devices that are not configured for ADE enrollment it is advisable to apply the JumpCloud MDM Enrollment policy (https://jumpcloud.com/support/create-a-mac-mdm-enrollment-policy). Note, this still requires the user to have administrator privileges to complete the MDM enrollment when they complete the enrollment process. Admins may temporarily elevate standard users to admin users by following the steps laid out in this article: https://jumpcloud.com/support/set-admin-sudo-permissions.

Additional Support Articles for Remediation
https://jumpcloud.com/support/re-enroll-apple-devices-into-mdm-and-preserve-device-record
https://jumpcloud.com/support/set-admin-sudo-permissions
https://jumpcloud.com/support/create-a-mac-mdm-enrollment-policy
Posted 26 days ago. Jan 16, 2025 - 15:12 MST
Monitoring
We've identified the issue and implemented a fix. Users that were un-enrolled from Apple MDM may need to re-enroll their device. More information on re-enrolling Apple devices can be found below:
https://jumpcloud.com/support/re-enroll-apple-devices-into-mdm-and-preserve-device-record
Posted 26 days ago. Jan 16, 2025 - 12:31 MST
Investigating
We are currently investigating an issue with agent installs and Apple MDM enrollments. We will update this issue as we know more.
Posted 26 days ago. Jan 16, 2025 - 12:16 MST
This incident affected: Agent and MDM.