Date: 2024-08-23
Date of Incident: 2024-08-20
Description: RCA for JumpCloud Push Notifications
Summary:
On August 20th, at 3:46 AM Mountain Time, the APNs certificate attached to the AWS SNS Platform Application for Apple expired, causing Apple devices to fail to receive or acknowledge push notifications. Additionally, new device registrations began failing. Already registered devices could authenticate using the TOTP code in the JumpCloud Protect app. The on-call team was paged, and the engineer started investigating. An internal incident was called at 4:31 AM, and responders from different teams began triaging MFA failures. The team identified the errors from SNS and escalated for additional support at 4:53 AM. At 5:25 AM, the team attempted a manual rotation of the certificate in SNS, but due to access restrictions, IaC configuration was needed. This discovery and limitation delayed the overall recovery window. By 5:59 AM, the Notification Service had uptaken the new APNs certificate, and a rolling restart was initiated. The APNs certificate was updated in the AWS SNS Platform Application by 6:10 AM, and push notifications began working again.
Root Cause:
The root cause of the incident was the expiration of the APNs certificate. This led to the failure of Apple devices to receive or acknowledge push notifications and the failure of new device registrations.
Corrective and Preventative Actions:
Immediate Corrective Actions:
Preventative Actions: