How StarRocks’ Primary Key Model Delivers 3‑5× Faster Real‑Time Queries
This article explains the design and implementation of StarRocks 2.x Primary Key tables, covering real‑time update mechanisms, write and commit workflows, in‑memory primary indexing, compaction, read‑path optimizations, performance benchmarks, and upcoming features such as partial and conditional updates.
Background
Real‑time data analysis is critical for digital operations, but many OLAP systems struggle with update latency and query performance when handling continuous data changes. StarRocks introduced a Primary Key model to address these challenges, offering up to 3‑5× faster queries on large‑scale real‑time workloads and already deployed in over 110 major customers.
Real‑Time Update Scenarios
Typical use cases include CDC streams that sync binlogs to analytical platforms and ELT pipelines that load raw data into an AP system for immediate SQL‑based transformation. Supporting these workloads requires strong real‑time update capabilities.
Update Mechanisms Compared
Copy‑on‑Write : Detects key conflicts, rewrites entire files with updated data. Provides optimal read performance but incurs high write cost, unsuitable for frequent real‑time updates.
Merge‑on‑Read : Writes new files without conflict checks, merges versions during reads. Write‑friendly but read‑heavy, used by systems like Hudi and ClickHouse.
Delta Store : Stores per‑row delta records alongside original files; reads merge deltas on‑the‑fly. Balances write and read costs but adds index complexity.
Delete‑and‑Insert : Marks old rows as deleted and writes new rows to fresh files. Offers the best write performance while keeping reads fast; StarRocks adopts this as its Primary Key model.
StarRocks Primary Key Implementation
The Primary Key table supports upsert and delete operations. Data is ingested via Load tasks, each acting as an ACID transaction that spans multiple tablets. The transaction consists of two phases:
Write : Data is partitioned and routed to target tablets, where it is written into column‑store Rowset files.
Commit : After all rows are flushed, the Frontend issues a commit that updates the primary index, generates a DelVector for deleted rows, and writes a new meta version to RocksDB.
Write Phase Details
Incoming rows accumulate in a MemTable; when full, they are flushed to disk.
Flush performs Sort (by primary key), Merge (keep latest version), and Split (separate upserts from deletes) operations.
A new Rowset containing multiple files is created once all data is flushed.
Commit Phase Details
Update the in‑memory primary index and mark overwritten rows as deleted.
Generate a DelVector that records deleted row positions.
Write a new meta version (including the new Rowset list) to RocksDB.
Tablet Internal Structure
Meta : Versioned metadata stored as protobuf in RocksDB.
Rowset : Column‑store files (StarRocks‑specific format) holding the actual data.
Primary Index : In‑memory hash map mapping encoded primary keys to (rowset_id, rowid). Built on‑demand and released when idle to save memory.
DelVector : Bitmap (RoaringBitmap) marking rows deleted within each Rowset, also cached in memory.
Compaction
Continuous imports generate many small Rowsets and deleted rows, degrading read performance. StarRocks runs background compaction that selects small or heavily‑deleted Rowsets, merges them into a new Rowset, updates the primary index, and creates a new DelVector. Conflicts with concurrent imports are detected by comparing original Rowset IDs; if a row has been updated, the compaction skips it and marks the old version as deleted.
Read Path Optimizations
Eliminates merge operations because outdated rows are already marked deleted.
Predicate push‑down reaches the low‑level Scan, allowing zonemap, bitmap, and bloom filters to accelerate filtering.
Parallel scanning of multiple Rowsets.
These factors together yield 3‑5× query speedups compared with traditional Unique‑key models.
Performance Evaluation
In an order‑table benchmark (10 M orders per day, updates over 20 days), StarRocks Primary Key tables achieved >10× faster queries during concurrent imports and ~3× faster queries after imports paused.
Future Work
Partial‑Update : Update only a subset of columns in wide tables.
Conditional‑Update : Apply updates only when specified conditions (e.g., timestamps) are met.
General read‑write transactions to support more complex ELT workloads.
Key Diagrams
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
StarRocks
StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
