Why Auto‑Increment Fails in Distributed Databases and How to Design Scalable Indexes
This article explains why using auto‑increment primary keys in a distributed database leads to collisions and performance issues, and presents practical strategies such as ordered UUIDs, embedding shard keys, index tables, and global table replication to achieve efficient, scalable index design.
Primary Key Selection
For a primary key it must be globally unique across all shards; using auto‑increment causes collisions because each shard can generate the same values. Example order table (sharding key is o_custkey, primary key is o_orderkey):
CREATE TABLE `orders` (
`O_ORDERKEY` int NOT NULL auto_increment,
`O_CUSTKEY` int NOT NULL,
`O_ORDERSTATUS` char(1) NOT NULL,
`O_TOTALPRICE` decimal(15,2) NOT NULL,
`O_ORDERDATE` date NOT NULL,
`O_ORDERPRIORITY` char(15) NOT NULL,
`O_CLERK` char(15) NOT NULL,
`O_SHIPPRIORITY` int NOT NULL,
`O_COMMENT` varchar(79) NOT NULL,
PRIMARY KEY (`O_ORDERKEY`),
KEY (`O_CUSTKEY`)
) ENGINE=InnoDBTherefore, avoid auto‑increment in distributed databases: it has poor performance, low safety, and does not suit distributed architectures.
Instead use a globally unique ordered key such as MySQL‑generated ordered UUIDs, business‑generated IDs, or open‑source algorithms like Snowflake (with caution about time rollback).
Index Design
Sharding keys route queries to a specific shard, but other indexes are still needed. Using the orders table as an example, querying by o_orderkey requires scanning all shards unless the key is the sharding key. SELECT * FROM orders WHERE o_orderkey = 1 Two possible designs:
Make o_orderkey the sharding key (redundant data).
Add sharding key information in the index.
Both rely on a space‑for‑time trade‑off.
An improved approach is to create an index table containing only o_orderkey and o_custkey:
CREATE TABLE idx_orderkey_custkey (
o_orderkey INT,
o_custkey INT,
PRIMARY KEY (o_orderkey)
)Query can be split into two steps, each routed by the sharding key, reducing the number of shards accessed to two regardless of total shard count:
SELECT * FROM orders WHERE o_orderkey = 1
-- step 1
SELECT o_custkey FROM idx_orderkey_custkey WHERE o_orderkey = 1
-- step 2
SELECT * FROM orders WHERE o_custkey = ? AND o_orderkey = 1A better design embeds the sharding key inside the primary key string, e.g., o_orderkey = concat(o_orderkey, o_custkey). Then a single‑shard query suffices:
SELECT * FROM Orders WHERE o_orderkey = '1000-1';This reduces storage overhead compared to a full index table while keeping ordered inserts.
For non‑unique secondary indexes, full‑shard scans may still be required.
Global Tables
Small tables that lack a sharding key (e.g., the nation table in the TPCH benchmark) can be replicated on every shard to avoid cross‑shard queries.
Unique Indexes
Unique indexes must also use globally unique mechanisms (UUID) to remain unique across shards. Even in single‑node MySQL, using globally unique designs is recommended because future scaling may require it.
Summary
Use ordered UUIDs as primary keys in distributed databases.
Design unique indexes with global uniqueness.
If a unique index is not the sharding key, store sharding information to route queries to a single shard.
Replicate small global tables on each shard to avoid cross‑shard queries.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
