What Triggered the Biggest Internet Outages of 2021? Lessons from 10 Major Incidents
A comprehensive review of ten major 2021 internet outages—from domestic platforms like Bilibili and Futu to global services such as Facebook, Roblox, and AWS—examines their root causes, the role of infrastructure design, and the operational lessons needed to improve system resilience.
In 2021, despite expectations that modern internet services could achieve "never‑down" reliability, a series of high‑profile outages demonstrated that system failures remain common and often stem from human error, infrastructure design flaws, or external disruptions.
Domestic Outages: Transparency as a Skill
Bilibili crash leaves young users sleepless
On July 13, Bilibili experienced a server failure that prevented login, driving users to other platforms and trending on social media. The brief statement "some server rooms failed" offered little insight.
Futu Securities service interruption and a 2,000‑word technical apology
On October 9, Futu’s trading app went down due to a power outage in an operator’s data center, causing multi‑data‑center network failure. Founder Li Hua later published a detailed 2,000‑word post explaining redundancy designs, the trade‑off between performance and fault tolerance, and how an IDC power issue became the single point of failure.
Xi'an "One‑Code‑Pass" collapses twice in half a month
Heavy pandemic‑related traffic overwhelmed the platform in December 2021 and again on January 4, 2022, leading to service unavailability and prompting authorities to call for capacity expansion.
International Outages: Small Bugs, Big Trouble
Facebook’s worst outage ever, wiping $300 billion in market value
On October 4, a routine network‑capacity test inadvertently cut all backbone connections, leaving over 3 billion users offline for nearly seven hours and causing a massive market‑cap loss.
Roblox suffers a 73‑hour outage due to a Consul bug
Roblox’s self‑hosted data centers use Consul for service discovery; enabling a streaming‑transfer feature introduced a bug that degraded performance and crashed the platform for 54 hours before the feature was disabled.
Salesforce engineer’s shortcut triggers a global outage
On May 11, a mis‑executed DNS configuration script timed out and propagated across data centers, causing a five‑hour service disruption for millions of users.
Cloud‑Provider Failures: Massive Blast Radii
OVH data‑center fire disables 3.6 million websites
A fire in Strasbourg’s SBG2 facility destroyed one data center and damaged another, taking down sites across 464 000 domains, including government and cryptocurrency services.
Fastly misconfiguration causes a global CDN outage
On June 8, a service‑configuration change triggered a worldwide 503 error, affecting major sites such as Amazon, Twitter, and the New York Times for about an hour.
Google Cloud outage due to GCLB configuration bug
On November 16, an incorrectly configured external load balancer caused a two‑hour outage that impacted services like YouTube, Gmail, and many enterprise customers.
AWS experiences three separate outages in December
Network overload, automated traffic‑shifting errors, and a data‑center power issue led to multiple service disruptions affecting Netflix, Slack, Coinbase, and many other platforms.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
