Uber’s PostgreSQL‑to‑MySQL Switch: Solving Index Bloat & Write Amplification
Uber migrated its core database from PostgreSQL to MySQL because PostgreSQL suffered from index bloat, write amplification, high replication overhead, and limited MVCC support, prompting a detailed analysis of each issue, the improvements in newer PostgreSQL releases, and how MySQL’s architecture addresses these challenges.
Background
When Uber first decided to replace its core database, it moved from PostgreSQL to MySQL. The decision was driven by several scalability problems in PostgreSQL, including index bloat, write amplification, high replication overhead, risk of data corruption, insufficient MVCC support on replica nodes, and a complex upgrade process.
Index Bloat in PostgreSQL
Index bloat occurs when a table’s active row count stays constant but its indexes keep growing, degrading query performance and wasting disk space. In PostgreSQL each row has a hidden ctid (current tuple ID) that points to its physical location. Updating a row creates a new version with a new ctid, while the old version remains until vacuum removes it.
Example table products: SELECT ctid, * FROM products WHERE id = 1; The result shows a ctid such as (126,3), meaning page 126, row 3. B‑tree secondary indexes store the key value together with the ctid. When a query uses the index, PostgreSQL first finds the ctid and then jumps directly to the physical location, making index look‑ups very fast.
Because PostgreSQL never updates rows in place, every UPDATE inserts a new index entry while the old entry stays, which is the root cause of index bloat.
Write Amplification
Each row update forces all related indexes to insert new entries, even if only a single column changes. When many indexes reference a row, the amount of write I/O grows dramatically, leading to higher storage costs and reduced performance.
MySQL’s InnoDB stores the primary key in secondary indexes instead of the physical address, so updating a row does not require rewriting secondary indexes, mitigating write amplification.
Replication Overhead
PostgreSQL uses write‑ahead logging (WAL) for physical replication, copying every change to the WAL and then to replicas. This generates large amounts of network traffic, especially under heavy update workloads.
MySQL’s row‑based replication copies only logical changes, which is more compact. PostgreSQL 10 introduced logical replication, reducing the overhead, but the older physical replication model still caused significant bandwidth consumption for Uber.
MVCC Support on Replicas
Older PostgreSQL versions could not run long‑running queries on replicas without pausing replication. If a query took too long, PostgreSQL would abort it to allow replication to continue, causing failures for Uber’s read‑heavy workloads.
PostgreSQL 9.4 added Hot Standby, allowing queries to run concurrently with replication, though some latency remains for very long queries. MySQL’s replica implementation already supported true MVCC, avoiding this issue.
Upgrade Complexity
Upgrading PostgreSQL required stopping the cluster because replicas could not be of a different major version. Uber had to perform a full downtime upgrade, synchronizing all replicas afterwards, which took several hours.
MySQL supports rolling upgrades, allowing replicas to be upgraded first, minimizing downtime. Later PostgreSQL releases added logical replication that enables cross‑version upgrades with near‑zero downtime.
Current State (PostgreSQL 17 as of March 2025)
Some of Uber’s pain points have been addressed: logical replication (PostgreSQL 10) solves replication overhead; Hot Standby (9.4) improves MVCC on replicas; and tools like pg_rewind simplify recovery from corruption. However, index bloat and write amplification remain inherent to PostgreSQL’s MVCC and ctid architecture, despite improvements such as REINDEX CONCURRENTLY and smarter autovacuum.
Conclusion
Uber’s migration to MySQL was motivated by a combination of index bloat, write amplification, replication cost, and upgrade difficulty. While newer PostgreSQL versions have mitigated many of these issues, the fundamental design choices still give MySQL an advantage in high‑update scenarios. Organizations must evaluate their specific workloads when choosing between the two databases.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
