Resolved: RCA - Azure Monitor missed alerts
Incident Report for BeX
Resolved
RCA - Azure Monitor missed alerts
Incident

Summary of Impact: Between 08:45 and 10:25 UTC on 21 Jan 2021, you were identified as a customer using Azure Monitor who may have experienced issues with missed alerts or missed alerts notifications. All alerts types (except for Activity Log Alerts) that fired between this impact window may not have appeared in the Azure Portal, and all alert types (excluding Metric Alerts and Activity Log Alerts) may not have triggered any Action Groups.

Root Cause: A regression was introduced for an alerts ingestion pipeline as a result of the invalid configuration of a previous deployment. As a result, all alerts requests sent to the Alerts Management Platform received unauthorized exceptions and were dropped. For Activity Log Alerts, long lasting retry mechanisms were in place, so the alerts were eventually processed but with a delay.

Mitigation: The configuration was rolled back to a previous version.

Next Steps: We apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future.

In this case, this includes (but is not limited to):

  • Creating a process at the Azure Metrics Platform to detect Alerts Management Platform failures faster.
  • Improve and harden safe deployment mechanisms to prevent regressions on deployments.

Provide Feedback: Please help us improve the Azure customer communications experience by taking our survey: https://aka.ms/AzurePIRSurvey.



Azure Monitor
Global

Posted Feb 04, 2021 - 06:11 CET
This incident affected: Bex.