Facebook faced what is being said as one of the most significant outages ever last night, with users not able to use the service for hours on end. The crippling outage at Facebook, WhatsApp, Instagram, Facebook Messenger, and more Facebook services occurred because of a problem in the company’s domain name system, a relatively unknown but crucial component of the internet.
Facebook has said that the outage was due to a configuration change to its routers, and users have nothing to worry about. “The underlying cause of this outage also impacted many of the internal tools and systems we use in our day-to-day operations, complicating our attempts to diagnose and resolve the problem quickly,” the company said in a blog post. Facebook noted that the configuration changes on the routers that coordinate network traffic between its data centers caused issues that interrupted this communication. Facebook’s machines weren’t able to talk to each other.
Before Facebook’s official statement, web infrastructure and website security company Cloudfare also detailed what caused the issue. In a blog post, Cloudflare said that “Facebook and related properties disappeared from the Internet in a flurry of Border Gateway Protocol (BGP) updates.” The problems began with a routine BGP update that went wrong, wiping out the DNS routing information that Facebook needs to allow other networks to find its sites.
Before understanding what these technical terms like DNS or BGP stand for and means, we tried to break down the problem into simpler words. Let’s start with the basics:
What is DNS, and what went wrong with it?
According to a report by Bloomberg, DNS is like a phone book for the internet. It’s the tool that converts a web domain, like Facebook.com, into the actual internet protocol, or IP, address where the site resides. Think of Facebook.com as the person one might look up in the white pages and the IP address as the physical address they’ll find.
On Monday, a technical problem related to Facebook’s DNS records caused outages. When a DNS error occurs, that makes turning Facebook.com into a user’s profile page impossible. That’s apparently what happened inside Facebook, but at a scale that’s temporarily crippled the entire Facebook ecosystem.
Facebook’s primary platforms were down, and their internal applications, including the company’s email system. Users on Twitter and Reddit also indicated that employees at Menlo Park, California campus could not access offices and conference rooms that required a security badge. That could happen if the system that grants access is also connected to the same domain, Facebook.com.
Now, what is BGP?
The same report by Bloomberg also states that the problem at Facebook Inc. appeared to have its origins in the Border Gateway Protocol or BGP. If DNS is the internet’s phone book, BGP is its postal service. When a user enters data on the internet, BGP determines the best available paths that data could travel.
Minutes before Facebook’s platforms stopped loading, public records show that a large number of changes were made to Facebook’s BGP routes, according to Cloudflare Inc.’s chief technology officer, John Graham-Cumming, in a Tweet.
While the BGP snafu may explain why Facebook’s DNS has failed, the company hasn’t yet commented why the BGP routes were withdrawn early on October 4.
Are Facebook services back up?
Yes. Most parts of Facebook’s services, including WhatsApp, Instagram, and Facebook, are back up and running. At the same time, we can use all three services flawlessly while writing this article. Facebook also says that its systems are all backup and running. Even the company’s WhatsApp Twitter handle has noted that the instant messaging platform is back and running 100 percent.