Databases 8 min read

Can One PostgreSQL Instance Power 800 Million Users? OpenAI’s Trade‑offs Unpacked

OpenAI runs a single primary PostgreSQL with about 50 replicas to serve roughly 800 million ChatGPT users, employing PgBouncer connection pooling, cache‑lock mechanisms, query and schema optimizations, workload isolation, multi‑layer throttling, and cascade replication, while acknowledging the architecture’s limits and the complexity of full sharding.

AI Engineering
AI Engineering
AI Engineering
Can One PostgreSQL Instance Power 800 Million Users? OpenAI’s Trade‑offs Unpacked

Architecture Boundaries

OpenAI’s blog clarifies that only core read‑write workloads reside in PostgreSQL; additional tables required by new features are placed in sharded systems such as Azure CosmosDB. Write‑intensive, easily sharded workloads have already been migrated, and new tables are prohibited in the current PostgreSQL deployment, highlighting the limits of the single‑master design.

Engineering Optimizations

Connection‑pool tuning : PgBouncer is deployed as a proxy, cutting average connection latency from 50 ms to 5 ms. Azure PostgreSQL caps connections at 5,000; PgBouncer’s statement‑ or transaction‑pooling mode efficiently reuses connections and reduces active connection counts. All components are co‑located in the same region to minimise network overhead, and idle‑timeout settings must be carefully configured to avoid connection exhaustion.

Cache‑lock mechanism : When multiple requests miss the cache simultaneously, only one acquires a lock to fetch data from PostgreSQL while others wait for the cache to be refreshed, preventing a cache‑miss storm from overwhelming the database.

Query‑optimization lessons : A costly 12‑table join once caused a severe SEV‑0 incident; the team now avoids complex joins, moving logic to the application layer when necessary. They also audit ORM‑generated SQL and set idle_in_transaction_session_timeout to stop long‑running idle queries that block autovacuum.

Workload isolation : To mitigate “noisy neighbor” effects, workloads are split into dedicated instances with low‑ and high‑priority tiers, ensuring resource‑intensive low‑priority jobs do not degrade high‑priority request performance.

Multi‑layer rate limiting : Rate limits are applied at the application, connection‑pool, proxy, and query layers. Short retry intervals are avoided to prevent retry storms, and the ORM layer is enhanced to enforce query‑specific throttling.

Schema‑change restrictions : Even minor schema changes (e.g., column type modifications) can trigger full‑table rewrites. Therefore, only lightweight alterations that avoid full rewrites are allowed, with a strict 5‑second timeout and concurrent index creation/deletion. New tables must be hosted in sharded systems like Azure CosmosDB.

Cascade replication : In collaboration with Azure PostgreSQL, OpenAI is testing cascade replication, where intermediate replicas forward WAL logs to downstream replicas. This can support over 100 replicas without overloading the primary, but adds operational complexity, especially around failover management.

Performance Data vs. Real‑World Experience

OpenAI reports that the system handles millions of QPS read‑heavy workloads with client P99 latency in the low double‑digit milliseconds and achieves five‑nine availability in production. Over the past 12 months there was only one SEV‑0 PostgreSQL outage, occurring during the viral spread of ChatGPT ImageGen when 100 million new users joined in a week and write traffic spiked more than tenfold, overwhelming the system.

Despite these metrics, some user‑facing issues remain, such as long sidebar load times in ChatGPT and iOS clients occasionally needing to restart the app to see responses.

Technical Debt and Pragmatic Choices

The OpenAI case shows that a heavily tuned relational database can still serve massive scale, but the decision to stay with a single‑master PostgreSQL is also a pragmatic compromise. Sharding existing workloads is described as “very complex and time‑consuming, requiring changes to hundreds of application endpoints and potentially months or years of effort.” Because the workload is primarily read‑heavy, the current architecture still has growth headroom.

OpenAI continues to optimise the single‑master setup rather than refactor the entire application layer, while keeping the possibility of future PostgreSQL sharding open. The authors advise developers not to follow any architecture blindly; every design is a trade‑off.

PostgreSQLOpenAIdatabase scalingCascade ReplicationPgBouncerAzure CosmosDB
AI Engineering
Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.