Databases 24 min read

How OpenAI Scaled PostgreSQL to Support 800 Million Users and Millions of QPS

OpenAI’s engineering team expanded a single‑primary PostgreSQL cluster with nearly 50 read‑only replicas, migrated write‑heavy workloads to Azure Cosmos DB, and applied extensive optimizations to reliably serve the global traffic of ChatGPT and the OpenAI API for 800 million users at multi‑million queries per second.

ITPUB
ITPUB
ITPUB
How OpenAI Scaled PostgreSQL to Support 800 Million Users and Millions of QPS

Scaling PostgreSQL to Support 800 Million ChatGPT Users

OpenAI’s official blog explains how they expanded a single‑primary PostgreSQL cluster with nearly 50 read‑only replicas to serve the global traffic of ChatGPT and the OpenAI API, supporting 800 million users and millions of queries per second.

Background and Growth

OpenAI’s PostgreSQL workload grew more than ten‑fold, with user count rising from 500 million to 800 million. When the primary instance approached its write‑throughput ceiling, the team off‑loaded write‑heavy, sharded workloads to Azure Cosmos DB (PostgreSQL + Citus) instead of building a custom sharding layer.

Architecture Overview

The production stack consists of a single Azure PostgreSQL Flexible Server primary instance and roughly 50 read replicas distributed across multiple regions. This topology handles read‑heavy traffic while keeping write latency low.

OpenAI scaling PostgreSQL diagram
OpenAI scaling PostgreSQL diagram

Key Challenges and Solutions

Write‑side bottleneck : A single primary cannot scale writes horizontally. OpenAI reduced primary load by migrating shardable write workloads to Cosmos DB, applying lazy writes, rate‑limiting backfills, and fixing duplicate‑write bugs.

Expensive queries : High‑cost joins and ORM‑generated SQL caused CPU spikes. The team rewrote queries, avoided large multi‑table joins, split logic into the application layer, and set idle_in_transaction_session_timeout to prevent idle transactions from blocking autovacuum.

Single‑point‑of‑failure : The primary was run in high‑availability mode with a hot standby. Failover procedures promote the standby quickly, while read replicas are deployed in multiple zones to survive individual replica failures.

Workload isolation : Resource‑intensive requests were moved to dedicated instances, separating high‑priority online traffic from low‑priority batch jobs.

Connection limits : Azure PostgreSQL caps connections at 5 000. PgBouncer was deployed in statement‑ and transaction‑pooling modes, reducing active connections from thousands to a handful and cutting connection latency from ~50 ms to ~5 ms.

Cache miss storms : A cache‑lock mechanism ensures only one request fetches a missing key from PostgreSQL, preventing “thundering‑herd” read amplification.

Replica scaling : To avoid overwhelming the primary with WAL streams, OpenAI is testing cascading replication, where intermediate replicas forward WAL to downstream replicas, allowing the replica count to grow beyond the current limit.

Rate limiting : Multi‑layer rate limiting (application, connection‑pool, proxy, query level) prevents traffic spikes from exhausting CPU, I/O, or connections, and blocks specific query digests during write spikes.

Schema changes : Only lightweight schema modifications that do not trigger full‑table rewrites are allowed, with a 5‑second timeout for DDL operations.

Results and Outlook

The system now delivers sub‑100 ms p99 latency with five‑nines availability. Over the past year only one SEV‑0 incident occurred during a viral release that temporarily increased write traffic ten‑fold. OpenAI plans to continue migrating write‑heavy workloads to sharded systems and to roll out cascading replication for further read‑replica expansion.

Expert Commentary (老冯)

Seven years ago the author managed a 1‑primary‑32‑replica PostgreSQL cluster handling 2.5 million QPS. He notes that single‑machine write limits typically sit at 100‑200 MB/s WAL throughput or 1‑2 million writes per second, and that PostgreSQL’s MVCC leads to write and read amplification under heavy write loads.

He also discusses the trade‑offs of sharding versus staying monolithic, the importance of isolating noisy neighbors, and the continued relevance of connection pooling with PgBouncer or newer tools like pgdog.

References

https://www.pgevents.ca/events/pgconfdev2025/schedule/session/433-scaling-postgres-to-the-next-level-at-openai/

https://openai.com/index/scaling-postgresql/

https://learn.microsoft.com/en-us/azure/postgresql/

https://www.cs.cmu.edu/~pavlo/blog/2023/04/the-part-of-postgresql-we-hate-the-most.html

https://en.wikipedia.org/wiki/PostgreSQL

https://www.postgresql.org/docs/current/warm-standby.html#CASCADING-REPLICATION

https://www.crunchydata.com/blog/when-does-alter-table-require-a-rewrite

https://newsletter.pragmaticengineer.com/p/chatgpt-images

high availabilityPostgreSQLScalingAzureRead Replicas
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.