Databases 6 min read

How OpenAI Really Scaled PostgreSQL for Hundreds of Millions of Users

The article debunks OpenAI's sensational claim of handling 800 million ChatGPT users with a single PostgreSQL instance, revealing a pragmatic hybrid architecture that combines many read replicas, Azure CosmosDB for write‑heavy workloads, and top‑tier hardware, while highlighting cost and complexity considerations.

Radish, Keep Going!
Radish, Keep Going!
Radish, Keep Going!
How OpenAI Really Scaled PostgreSQL for Hundreds of Millions of Users

OpenAI’s Real Scaling Strategy

The original OpenAI blog headline suggested that a single PostgreSQL deployment could serve 800 million ChatGPT users, but the reality is a set of pragmatic trade‑offs rather than a magical database optimization.

What They Actually Did

OpenAI added 50 read‑only replicas to distribute query load, offloaded the high‑frequency write workload of conversation history to Azure CosmosDB, and kept core business logic on a few ultra‑large PostgreSQL primary nodes.

Money Enables the Hardware

Running on Azure’s top‑tier M‑series VMs, OpenAI’s setup includes 896 vCPU, 32 TB of RAM, and a monthly cost of about $175,000, allowing most hot data to reside in memory and making vertical scaling feasible.

Key Takeaways for Most Companies

Such a configuration is unattainable for most organizations; however, it demonstrates that PostgreSQL’s performance ceiling is higher than many assume. Often the bottleneck is not the database itself but the willingness to invest in expensive hardware.

The Ironic Admission

OpenAI admitted that sharding would require rewriting hundreds of code endpoints, a task that could take months or years, underscoring the current limits of AI‑assisted large‑scale system refactoring.

Practical Technical Nuggets

When altering schemas under high load, OpenAI uses a strict lock_timeout to prevent long‑running blocks and, if necessary, forcibly terminates conflicting transactions during DDL execution.

Read‑write separation with many replicas is a low‑cost, low‑risk way to scale read‑heavy workloads, avoiding the complexity of sharding or moving to a distributed database.

They migrated conversation history to CosmosDB because the workload is write‑intensive, the data model is simple, complex joins are unnecessary, and eventual consistency is acceptable.

Complexity of a Hybrid Architecture

While effective, this approach introduces operational overhead: data is spread across multiple systems, multiple tech stacks must be maintained, and cross‑system consistency becomes a real challenge.

Bottom‑Line Advice

In 2026, a single powerful PostgreSQL instance on high‑end hardware can comfortably support a unicorn‑scale application. Before pursuing micro‑services, sharding, or extensive partitioning, consider whether you can simply invest in better hardware and keep the architecture straightforward.

Gemini_Generated_Image_vud86svud86svud8
Gemini_Generated_Image_vud86svud86svud8
Pasted image 20260123224924
Pasted image 20260123224924
Pasted image 20260123223910
Pasted image 20260123223910
Pasted image 20260123224300
Pasted image 20260123224300
Database ArchitecturePostgreSQLScalingRead Replicaslock_timeoutAzure CosmosDB
Radish, Keep Going!
Written by

Radish, Keep Going!

Personal sharing

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.