Agent Propagation Degradation
Incident Report for JumpCloud
Postmortem

JumpCloud Incident Report

Date: 2022-04-07

Date of Incident: 2022-03-31

Description: RCA for JumpCloud Agent Dispatch Delay

Summary:

On 2022-03-31, some JumpCloud customers experienced delays in Agent dispatch times.  These delays started around 08:00 UTC lasting until approximately 15:56 UTC.  Similar delays with Agent dispatch also occurred in smaller windows on 2022-03-19, and 2022-03-30.

Root Cause:

In our efforts to improve the overall efficiency with Agent functionality and move towards IoT messaging, we ran into two issues with the deployment of these changes.  First, the code to publish MQTT messages was missing a timeout parameter for long operations which slowed down some of the longer operations.  Second, we suspect, and are still investigating possible throttling of these messages by our Messaging Service Provider(1).  We have rolled back all changes while we work through the corrective actions.

Corrective Actions / Risk Mitigation:

  • Add additional alerting and tracing around this service - DONE
  • Introduce the proper timeout configuration for this code base - DONE
  • Finalize investigation with Messaging Service Provider - Target 2022-04

1 We are still investigating quota, or rate limiting issues with this service imposed by our service provider.  We will update this as we uncover those results.

Posted Apr 07, 2022 - 17:06 MDT

Resolved
We have resolved the current issue with delays in dispatching messages to the JumpCloud Devices Agent. We are still actively investigating the root cause for this and prior issues with propagating these messages. We will be providing a complete RCA in the coming days when our investigation is complete. We apologize for the frustration around these events, it is our top priority to complete this investigation as quickly as possible.
Posted Mar 31, 2022 - 09:18 MDT
Update
We are currently processing new messages to devices and are starting to process the message backlog as well. We are still investigating root cause and will update when we have more information.
Posted Mar 31, 2022 - 08:29 MDT
Update
We are continuing to investigate this issue.
Posted Mar 31, 2022 - 08:15 MDT
Update
We are continuing to investigate this issue.
Posted Mar 31, 2022 - 07:42 MDT
Update
We are continuing to investigate this issue.
Posted Mar 31, 2022 - 07:13 MDT
Update
We are continuing to investigate this issue.
Posted Mar 31, 2022 - 06:44 MDT
Update
We are continuing to investigate this issue.
Posted Mar 31, 2022 - 06:12 MDT
Investigating
We are currently investigating an issue with delays in syncing users with the JumpCloud Devices Agent. We will report back as soon as we have an update, or in 30 minutes. We apologize for any disruption this may cause.
Posted Mar 31, 2022 - 05:39 MDT
This incident affected: Agent.