The Times Australia
The Times World News

.

What caused the unprecedented Facebook outage? The few clues point to a problem from within

  • Written by David Tuffley, Senior Lecturer in Applied Ethics & CyberSecurity, Griffith University
What caused the unprecedented Facebook outage? The few clues point to a problem from within

Suddenly and inexplicably, Facebook, Instagram, WhatsApp, Messenger and Oculus services were gone. And it was no local disturbance. In a blog post, Downdetector.com[1], a major monitoring service for online outages, called it[2] the largest global outage it had ever recorded — with 10.6 million reports from around the world.

The outage had an especially massive knock-on effect[3] on individuals and businesses around the world that rely on Whatsapp[4] to communicate with friends, family, colleagues and customers.

It took Facebook nearly six hours to get services back online, albeit slowly at first. Ironically, the outage was so pervasive Facebook had to resort to using Twitter, its rival platform, to get updates out into the world.

The internet and its outwardly visible face (the World Wide Web) is a remarkably fault-tolerant machine. It was designed to be resilient — and the web has never gone down completely. As such, global outages like this one are quite rare[5].

But they do happen. To Google’s embarrassment, several of its services including Gmail, YouTube, Hangouts, Google Calendar and Google Maps went offline[6] for about an hour in December last year.

And in June this year, a cloud-computing company that services clients such as the Guardian, the New York Times, Reddit and The Conversation went offline too.

Read more: Fastly global internet outage: why did so many sites go down — and what is a CDN, anyway?[7]

What caused it?

While Facebook’s management was apologetic, they gave no hint as to what caused the outage.

With hacking issues becoming all too common in today’s cyber-security threat environment, the question arises whether Facebook’s outage might have been the result of a successful hack. But this seems unlikely.

According to a report from The Verge[8] referencing Facebook’s Chief Technology Officer and Vice President of Infrastructure, it seems the problem was probably Facebook’s internal infrastructure.

Facebook engineers were sent to one of the company’s data centres in California to work on the problem, which implies they were unable to log in remotely to the data centre.

Experts have said[9] the outage could have only have come from inside the company. It’s likely Facebook engineers inadvertently made changes to how the network is set up, creating a cascading set of problems.

Such events have happened before, albeit not with such a catastrophic effect.

However, given the highly confidential way Facebook operates its network, it’s not possible to know exactly what happened with the network configuration. We will probably never be told.

A Domain Name Server problem

Supporting the network configuration explanation is the fact that the error messages that appeared when people tried to contact facebook.com and whatsapp.com indicated it was a DNS problem. So the websites still existed, but couldn’t be reached.

DNS stands for Domain Name Server[10] and is described as the “phonebook of the internet”. It translates domain names read by us into encoded internet addresses (IP addresses) to be read by computers.

When you enter a domain name such as “facebook.com” or “whatsapp.com” into your browser, the Domain Name Server is consulted and the corresponding encoded internet address[11], the IP, is called.

Read more: 'What is my IP address?' Explaining one of the world's most Googled questions[12]

When everything is working as it should, the user is then connected to the requested domain. On the strength of evidence gleaned from expert sources close to Facebook, it seems most unlikely the outage was caused by an external attack.

According to Statista, the country with the largest number of Facebook users is India, followed by the US, Indonesia, Brazil and Mexico (based on data from July, 2021). Simon / Pixabay

A whistleblower speaks up

The Facebook outage occurred only hours after the US-based 60 Minutes program aired an incendiary interview[13] with former Facebook employee and whistleblower, 37-year-old Harvard graduate Frances Haugen.

In a complaint to federal law enforcement, and in the interview, Haugen alleges[14] Facebook’s Instagram app is harming teenage girls, and that Facebook’s own research indicates the company “amplifies hate, misinformation and political unrest, but the company hides what it knows”.

To support the allegations, Haugen shared more than 10,000 pages of internal documentation with the US Securities and Exchange Commission — all pretty damning stuff. She said[15]:

The thing I saw at Facebook over and over again was there were conflicts of interest between what was good for the public and what was good for Facebook, and Facebook over and over again chose to optimise for its own interests, like making more money.

Given the timing of the interview and Facebook’s global outage, it’s natural to wonder whether the two events are connected. However, with the absence of any definitive evidence to support this theory, a causal link has not been established between both events.

But considering the seriousness of Haugen’s allegations, and the weight of objective evidence in the form of thousands of insider documents, it’s clear further investigation is warranted.

Facebook has around 2.89 billion monthly active users and a market capitalisation[16] of US$1.21 trillion. By any standard, it’s a big and powerful company with a great deal of influence. Now is the time to shine a light on its ethics, or lack thereof.

Hopefully there won’t be any more outages to slow down this process.

References

  1. ^ Downdetector.com (downdetector.com)
  2. ^ called it (www.theguardian.com)
  3. ^ massive knock-on effect (www.nytimes.com)
  4. ^ rely on Whatsapp (www.theguardian.com)
  5. ^ quite rare (qz.com)
  6. ^ went offline (www.nytimes.com)
  7. ^ Fastly global internet outage: why did so many sites go down — and what is a CDN, anyway? (theconversation.com)
  8. ^ The Verge (www.theverge.com)
  9. ^ said (www.theverge.com)
  10. ^ Domain Name Server (www.cloudflare.com)
  11. ^ encoded internet address (www.investopedia.com)
  12. ^ 'What is my IP address?' Explaining one of the world's most Googled questions (theconversation.com)
  13. ^ incendiary interview (www.youtube.com)
  14. ^ alleges (www.theguardian.com)
  15. ^ said (www.usatoday.com)
  16. ^ market capitalisation (www.gobankingrates.com)

Read more https://theconversation.com/what-caused-the-unprecedented-facebook-outage-the-few-clues-point-to-a-problem-from-within-169249

Times Magazine

Building an AI-First Culture in Your Company

AI isn't just something to think about anymore - it's becoming part of how we live and work, whether we like it or not. At the office, it definitely helps us move faster. But here's the thing: just using tools like ChatGPT or plugging AI into your wo...

Data Management Isn't Just About Tech—Here’s Why It’s a Human Problem Too

Photo by Kevin Kuby Manuel O. Diaz Jr.We live in a world drowning in data. Every click, swipe, medical scan, and financial transaction generates information, so much that managing it all has become one of the biggest challenges of our digital age. Bu...

Headless CMS in Digital Twins and 3D Product Experiences

Image by freepik As the metaverse becomes more advanced and accessible, it's clear that multiple sectors will use digital twins and 3D product experiences to visualize, connect, and streamline efforts better. A digital twin is a virtual replica of ...

The Decline of Hyper-Casual: How Mid-Core Mobile Games Took Over in 2025

In recent years, the mobile gaming landscape has undergone a significant transformation, with mid-core mobile games emerging as the dominant force in app stores by 2025. This shift is underpinned by changing user habits and evolving monetization tr...

Understanding ITIL 4 and PRINCE2 Project Management Synergy

Key Highlights ITIL 4 focuses on IT service management, emphasising continual improvement and value creation through modern digital transformation approaches. PRINCE2 project management supports systematic planning and execution of projects wit...

What AI Adoption Means for the Future of Workplace Risk Management

Image by freepik As industrial operations become more complex and fast-paced, the risks faced by workers and employers alike continue to grow. Traditional safety models—reliant on manual oversight, reactive investigations, and standardised checklist...

The Times Features

Flipping vs. Holding: Which Investment Strategy Is Right for You?

Are you wondering whether flipping a property or holding onto it is the better investment strategy? The answer isn’t one-size-fits-all. Both strategies have distinct advantages a...

Why Everyone's Talking About Sea Moss - And Should You Try It Too?

Sea moss - a humble marine plant that’s been used for centuries - is making a major comeback in modern wellness circles. And it’s not just a trend. With growing interest from athle...

A Guide to Smarter Real Estate Accounting: What You Might Be Overlooking

Real estate accounting can be a complex terrain, even for experienced investors and property managers. From tracking rental income to managing property expenses, the financial in...

What Is the Dreamtime? Understanding Aboriginal Creation Stories Through Art

Aboriginal culture is built on the deep and important meaning of Dreamtime, which links beliefs and history with the elements that make life. It’s not just myths; the Dreamtime i...

How Short-Term Lenders Offer Long-Lasting Benefits in Australia

In the world of personal and business finance, short-term lenders are often viewed as temporary fixes—quick solutions for urgent cash needs. However, in Australia, short-term len...

Why School Breaks Are the Perfect Time to Build Real Game Skills

School holidays provide uninterrupted time to focus on individual skill development Players often return sharper and more confident after structured break-time training Holid...