1 (866) 866-2320 Straight Talks Events Blog

The AT&T Outage: How It Happened and How It Could Have Been Prevented


The AT&T Outage: How It Happened and How It Could Have Been Prevented


This content is brought to you by Evolven. Evolven Change Analytics is a unique AIOps solution that tracks and analyzes all actual changes carried out in the enterprise cloud environment. Evolven helps leading enterprises cut the number of incidents, slash troubleshoot time, and eliminate unauthorized changes. Learn more

The AT&T Outage:

On February 22nd, 2024, AT&T customers across the country experienced widespread outages that affected more than 70,000 users nationwide and interrupted voice, data, and 911 services. The outage began around 5 AM EST and lasted several hours until about 2 PM EST before services were fully restored. AT&T has been vague about the root cause but it seems it was a software update to network infrastructure that was rolled out across its national backbone network. When the application and execution of an incorrect process was used while working to expand AT&T's network; it resulted in the outages.

Change’s Are Risky Without Oversight:

This type of outage underscores the risks involved whenever network changes are implemented, even by a mature international technology company like AT&T. Without proper safeguards in place, something as routine as a network upgrade can quickly snowball into massive disruptions impacting millions of customers. This is where solutions like Evolven can help strengthen change management processes and prevent outages.

Evolven provides a Configuration Risk Intelligence platform built from the ground up to provide visibility, automation, and control across on-premises and cloud environments. A key component is Evolven's change and configuration monitoring which allows organizations to establish a pre-deployment model of their infrastructure, including servers and network, and then continuously monitor for any drift or deviations that might create issues later.

If Evolven had been deployed prior to AT&T's network upgrade, it would have detected inconsistencies between the changes and baseline of the network to highlight risk. This would have alerted AT&T to potential issues and enabled them to simulate and test the changes in a sandbox environment first before rolling out more widely. If disruptive impacts were identified, the release could have been halted to remediate the problems proactively.

While mistakes inevitably happen, especially in the ever-changing complex IT environments of today, solutions such as Evolven’s have been developed to minimize the blast radius when they do. Evolven gives organizations the guardrails and control needed to confidently accelerate and manage change, even in complex multi-cloud environments. For service providers and enterprises managing mission-critical infrastructure such as health and financial services, solutions like Evolven are critical for avoiding outages that impact customers, reputation, and the bottom line.

AI Insights for Infrastructure Changes and Upgrades:

By leveraging AI and machine learning, Evolven allows teams to implement network upgrades smoothly and avoid costly downtime. As networks grow more complex, automating and enforcing IT controls will only become more important. AT&T's outage underscores the need for change and configuration monitoring to safeguard reliability. With the right solutions in place, service disruptions can be avoided even as networks scale and evolve

To find out how Evolven can help your organization avoid what AT&T just experienced, request a demo.

About the Author
Jim Wachhaus
Director of Product Marketing, has been in technical roles on cybersecurity products for over two decades and is passionate about the discipline of cyber system defense.