Microsoft Global Outage: Microsoft says 'underlying cause' fixed amid global outage

A significant outage in Microsoft's cloud services on Friday, July 19, caused widespread disruptions across various sectors worldwide. The incident affected airlines, financial services, media groups, and healthcare, highlighting the heavy reliance on cloud infrastructure for critical operations.

Microsoft Global Outage

Global Impact: Affected Regions and Sectors

The outage was not limited to a specific geographic area. Microsoft Windows users in India, Australia, Germany, the United States, the UK, and other regions reported encountering the infamous blue screen issue on their laptops. This problem led to automatic system restarts or shutdowns, causing significant interruptions in business operations.

The outage severely impacted the Central US region, crippling essential systems for numerous airlines. In the United States, American Airlines, Frontier Airlines, Allegiant, and Sun Country faced significant operational challenges. In India, IndiGo and other airlines also reported disruptions.

Microsoft Global Outage

Causes Behind the Outage

Microsoft identified the root cause of the outage and managed to recover most services. However, many customers continued to face issues, particularly with Microsoft 365 services like Teams.

  • Configuration Change

Microsoft identified that the primary cause of the outage was a "configuration change" within a section of their Azure backend workloads. This change disrupted the interaction between storage and compute resources, rendering several Microsoft 365 apps, including Teams, unusable. While Microsoft reported that the majority of services had been restored, many customers continued to face issues.

  • CrowdStrike's Role

Reports also suggested that a recent update from CrowdStrike, a cybersecurity company, contributed to the problem. This update reportedly caused the "blue screen of death" on Windows devices. The Sydney Morning Herald noted that the issue stemmed from a fault in CrowdStrike’s "Falcon sensor," a security tool installed on many business computers.

CrowdStrike acknowledged the problem, stating, “Our Engineers are actively working to resolve this issue and there is no need to open a support ticket.” The company assured users that updates would be provided once the issue was resolved.

Microsoft Global Outage

What is CrowdStrike?

CrowdStrike is a prominent cybersecurity platform offering security solutions to businesses and individual users. Its Falcon platform uses a single sensor and a unified threat interface to provide real-time protection against identity-driven breaches across endpoints, workloads, and identities. The recent update causing issues appears to have led to a malfunction in the Falcon Sensor, conflicting with Windows systems.

CrowdStrike has confirmed awareness of the issue and assured that their engineers are working diligently to address it. Users will be kept informed as progress is made.

Microsoft confirmed that the Azure outage was resolved early Friday, but the incident has highlighted the vulnerabilities associated with heavy reliance on cloud services. The disruption has affected a wide array of sectors, including airlines, banks, supermarkets, media outlets, and various other businesses.

Microsoft Global Outage

What is Blue Screen of Death?

The Blue Screen of Death (BSOD) is a critical error screen that appears on Windows operating systems when the system encounters a severe issue preventing it from operating safely. This error typically results in an unexpected restart of the computer, with a risk of losing unsaved data. The BSOD error message generally states: “Your PC ran into a problem and needs to restart. We are just collecting some error info, then we will restart for you.”

While the Blue Screen of Death is a common issue across Windows, similar critical errors can occur on other operating systems such as macOS and Linux.

Microsoft Global Outage

Detailed List of Affected Services and Sectors

  • Global Airlines

  1. US Airlines: American Airlines, Delta Airlines, United Airlines, Frontier Airlines, Allegiant, Sun Country Airlines

  2. Indian Airlines: Air India, IndiGo, SpiceJet, Akasa

  3. Other International Airlines: Ryan Air, Air France, Cathay Pacific, Eurowings, KLM, Scandinavian Airlines, Vueling

  • Airports Affected

  1. India: New Delhi, Chennai, Bengaluru, Mumbai, Hyderabad, Jaipur

  2. International: Berlin, Prague, Amsterdam, Madrid, Barcelona, London, Edinburgh, Brussels, Sydney, Hong Kong, Lisbon, Singapore

  • Cloud Services

Microsoft's Azure services were particularly affected. Amazon Web Services (AWS) reported no impact but was investigating issues related to 'Windows EC2 and workspaces.'

  • Banking and Retail

The outage led to payment system failures, resulting in long queues in retail outlets, especially in Australia. Financial and banking institutions also faced accessibility issues.

  • Individual Users

Millions of Windows users globally experienced the Blue Screen of Death, causing system shutdowns and restarts. This resulted in significant disruptions and data loss across various sectors, particularly IT.

  • Social Media and Online Services

According to Downdetector, some users faced issues with apps like Instagram, Amazon, Gmail, the State Bank of India, ICICI Bank, Bank of India, and HDFC Bank.

  • Microsoft Services

The impacted Microsoft services included:

  1. PowerBI: Users in read-only mode

  2. Microsoft Fabric: Users in read-only mode

  3. Microsoft Teams: Inability to leverage functions like presence, group chats, and user registration

  4. Microsoft 365 Admin Center: Intermittent access issues for admins

  5. Microsoft Purview: Delay in event processing

  6. Viva Engage: Access issues for users

Microsoft Global Outage

Microsoft's Response

Microsoft acknowledged the issues with its Azure services and Microsoft 365 apps. The company stated that services were gradually improving as mitigation actions continued. Microsoft worked on rerouting impacted traffic to alternate systems to expedite recovery.

What’s Next?

Following the outage, Microsoft reported observing a positive trend in service availability due to the traffic redirecting. Services were slowly coming back online. For instance, Frontier Airlines announced that their operations were resuming gradually, lifting the ground stop and normalizing systems.

This outage underscores the vulnerability of critical infrastructure reliant on cloud services and the cascading effects such disruptions can have on global business operations.

With inputs from agencies

Image Source: Multiple agencies

© Copyright 2024. All Rights Reserved Powered by Vygr Media.