I got 502 problems, and Cloudflare sure is one: Outage interrupts your El Reg-reading pleasure for almost half an hour

Jul 2, 2019

for_telcos_blog

UPDATED Cloudflare, the outfit noted for the slogan "helping build a better internet", had another wobble today as "network performance issues" rendered websites around the globe inaccessible.

The US tech biz updated its status page at 1352 UTC to indicate that it was aware of issues, but things began tottering quite a bit earlier. Since Cloudflare handles services used by a good portion of the world's websites, such as El Reg, including content delivery, DNS and DDoS protection, when it sneezes, a chunk of the internet has to go and have a bit of a lie down. That means netizens were unable to access many top sites globally.

A stumble last week was attributed to the antics of Verizon by CTO John Graham-Cumming. As for today's shenanigans? We contacted the company, but they've yet to give us an explanation.

While Cloudflare implemented a fix by 1415 UTC and declared things resolved by 1457 UTC, a good portion of internet users noticed things had gone very south for many, many sites.

The company's CEO took to Twitter to proffer an explanation for why things had fallen over, fingering a colossal spike in CPU usage as the cause while gently nudging the more wild conspiracy theories away from the whole DDoS thing.

However, the outage was a salutary reminder of the fragility of the internet as even Firefox fans found their beloved browser unable to resolve URLs.

Ever keen to share in the ups and downs of life, even Cloudflare's site also reported the dread 502 error.

As with the last incident, users who endured the less-than-an-hour of disconnection would do well to remember that the internet is a brittle thing. And Cloudflare would do well to remember that its customers will be pondering if maybe they depend on its services just a little too much.

Updated to add at 1702 BST

Following publication of this article, Cloudflare released a blog post stating the "CPU spike was caused by a bad software deploy that was rolled back. Once rolled back the service returned to normal operation and all domains using Cloudflare returned to normal traffic levels."

Naturally it then added....

"We are incredibly sorry that this incident occurred. Internal teams are meeting as I write performing a full post-mortem to understand how this occurred and how we prevent this from ever occurring again." ®

About the author

Related Articles

OX App Suite Software Subscription v8 – Delivering a better...

For over 15 years Open-Xchange has led the market in hosted email solutions, right from when we released our first real SaaS...

Jan Tran Nov 29, 2023

Move your email from a cost center to a profit center

More than half of the world’s population – 4.2 billion people – now uses email, with this number predicted to increase to...

Errol Vanderhorst Jul 25, 2023

PowerDNS brings encrypted DNS capabilities onto routers for the...

Helps protect confidentiality and integrity of traffic in the first mile CPE (customer premise equipment) manufacturers,...

Chris Holder Jul 5, 2023

DNSdist as a router-ready solution

As you might have read, with the release of DNSdist 1.8, PowerDNS brings DNS encryption with DNS over TLS (DoT) and DNS over...

Bob Brandt Apr 12, 2023