Operations 9 min read

Why Did Cloudflare’s Global Outage Happen on Nov 18 2025? Inside the Bot Management Bug

On the night of November 18 2025, Cloudflare suffered a worldwide outage that crippled services like ChatGPT, X, Spotify, and major gaming platforms, and a detailed post‑mortem reveals that a ClickHouse permission change caused an oversized bot‑management configuration file to crash edge nodes.

dbaplus Community
dbaplus Community
dbaplus Community
Why Did Cloudflare’s Global Outage Happen on Nov 18 2025? Inside the Bot Management Bug

Event Timeline

19:05 – Engineers deployed a ClickHouse access‑control change.

19:28 – Change took effect; the fault began.

19:32‑21:05 – Incident response teams investigated.

21:05 – First mitigation stage applied, but the core issue persisted.

21:37 – Root cause identified.

22:24 – Generation of new Bot Management configuration files stopped; edge nodes rolled back to the previous stable files.

22:30 – Core services restored.

01:06 (next day) – All systems fully recovered.

Root Cause

The outage was triggered by an internal permission change in Cloudflare’s ClickHouse database. The change allowed queries to return metadata from the r0 database in addition to the default schema, doubling the number of rows returned. This caused the Bot Management feature file to grow beyond the hard‑coded size limit of the traffic‑routing software on every edge node, leading to crashes and HTTP 5xx errors.

Technical Details

Cloudflare’s Bot Management module uses a machine‑learning model that scores each request. The model consumes a feature file that is regenerated every few minutes from a ClickHouse query. After the permission change, the query emitted many duplicate feature rows, inflating the file size roughly two‑fold.

The edge software validates the feature file size before loading it. When the file exceeded the limit, the process aborted, causing the edge proxy to return “Internal Server Error” responses. Because the proxy is a shared component for services such as Workers KV and Access, those services also began returning 5xx errors.

Impact

The failure affected roughly half of global internet traffic. Major platforms—including ChatGPT, Claude, Perplexity, X (Twitter), Spotify, Discord, Grindr, League of Legends, Minecraft servers, and many others—experienced 500 errors, verification page hangs, or complete unavailability.

Remediation and Follow‑up

Cloudflare halted generation of new Bot Management configuration files, rolled back edge nodes to the previous stable version, and restored services within a few hours. The incident is documented in the official post‑mortem at https://blog.cloudflare.com/18-november-2025-outage/.

CDNClickHouseincident analysisOutageCloudflareBot Management
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.