Date: May 5, 2025
Date of Incident: Apr 30, 2025
Description: RCA for User Portal / SSO Failed Authentication
Summary:
On April 30th at 12:35 UTC our monitors detected a significant increase in errors across our User Console API endpoints. The increased errors manifested as failed authentication for new attempts to the User Portal, SSO and other services. Existing connections to applications continued uninterrupted. At 12:41 UTC a formal incident was declared and multiple teams were paged. The issue was resolved at 12:58 UTC
Root Cause:
A state transition anomaly occurred during the credential rotation process for a database within our authentication infrastructure, resulting in connection failures by dependent services to that layer. Consequently, end-users experienced authentication errors when attempting to establish new sessions with specific applications.
The authentication service team was able to identify a failure with secondary credentials during the rotation and quickly failed back to valid secrets.
Corrective Actions / Risk Mitigation: