Databases 15 min read

How StarRocks Enables Real-Time Updates in Analytical Databases

The article explains why analytical databases struggle with real‑time data changes due to columnar storage, complex indexes and distributed processing, and then details StarRocks' primary‑key model, adaptive update mode, bitmap indexes, row/column partial updates, and practical SQL upsert techniques to achieve low‑latency updates without sacrificing query performance.

StarRocks
StarRocks
StarRocks
How StarRocks Enables Real-Time Updates in Analytical Databases

Why Real‑Time Matters for Analytical Databases

Analytical databases are essential for data‑driven decisions and large‑scale analytics such as data mining and machine learning. As data science and AI evolve, the need for fresh, real‑time results (instead of T+1 latency) has become mainstream.

Challenges of Real‑Time Updates

Storage structure : Most analytical systems use column‑oriented storage, which optimises read performance but incurs high overhead when inserting or updating rows because entire files may need to be rewritten.

Indexes and pre‑aggregation : Complex bitmap indexes and query rewrites improve query speed but increase the cost and complexity of data modifications.

Data distribution and parallel processing : In MPP architectures data is spread across many nodes; updating data often requires cross‑node synchronization, adding latency.

Consequently, even modern analytical databases that add near‑real‑time capabilities still lag behind OLTP systems designed for frequent writes.

StarRocks Improvements

StarRocks introduces a self‑developed Primary Key model with several key enhancements:

Adaptive update mode : Under columnar storage, only the columns that change are rewritten, avoiding full‑row rewrites and dramatically reducing I/O.

Primary‑key bitmap index : A bitmap index built on the primary key enables fast location of rows for both queries and updates; combined with delete vectors it supports efficient upserts.

Index persistence : Inactive primary‑key indexes are flushed to disk, lowering memory usage while keeping update performance comparable to in‑memory indexes.

These changes create a new table format that supports real‑time ingestion (e.g., from Kafka) and OLTP‑style updates.

Primary‑Key Update Mechanics

StarRocks adopts a Delete + Insert strategy: incoming batches are matched via the primary‑key index, the old rows are marked as deleted, and the new rows are written as fresh records. This avoids costly merge operations and allows the filter to be pushed down to the scan layer.

Other classic approaches are also described for context:

Copy on Write : Detect conflicts, rewrite entire files with updated data – optimal read performance but high write cost.

Merge on Read : Write new files without conflict checks and merge versions at read time – low write cost but poor read performance.

Delta Store : Store modifications as delta records and merge them during reads – a trade‑off between write and read efficiency.

Implementation Details

To make Delete + Insert efficient, StarRocks uses:

Roaring bitmap to store per‑column delete markers; the bitmap is kept in RocksDB (or object storage) and cached in memory.

Two primary‑key index implementations:

In‑memory index : Built on‑the‑fly during data load and released when idle, suitable for low‑latency scenarios.

Persisted index : Partially stored on disk using an LSM‑like layered storage, offering near‑in‑memory performance with reduced memory footprint.

StarRocks also supports partial column updates via two modes:

Row mode : Suitable for small‑batch real‑time updates; missing columns are fetched from existing files and merged into a new columnar file. Example illustration:

Column mode : Designed for large‑batch updates; the system reads original column data, merges with updates, and writes new partial column files, avoiding merge steps at query time. Illustration:

Read‑write amplification formulas:

Row mode amplification = 1 / C% where C% is the proportion of columns updated.

Column mode amplification = 1 / R% where R% is the proportion of rows updated.

Practical Upsert and Update Examples

Upsert via CSV : A CSV file with id and new values can be sent via HTTP PUT to update the target table.

2,40
3,32
5,32
7,24

Alternatively, an equivalent SQL statement can be used:

UPDATE employee SET age = age + 1 WHERE id IN (2,3,5,7);

Partial update example (SQL):

UPDATE employee SET age = age + 1 WHERE id IN (2,3,5,7);

Conditional update for out‑of‑order streams : Only the latest record per primary key is retained. Example order table shows how newer timestamps overwrite older ones, while delayed records are ignored.

SQL Update Patterns

Using FROM clause to join tables and update in a single statement:

UPDATE ship_fee
SET fee = fee * 1.1
FROM orders
WHERE orders.shipdate >= '2023/09/15'
  AND ship_fee.company = 'SF'
  AND ship_fee.orderid = orders.orderid;

Using CTE for clearer logic:

WITH increase_fee AS (
    SELECT * FROM orders
    WHERE shipdate >= '2023/09/15'
)
UPDATE ship_fee
SET fee = fee * 1.1
FROM increase_fee
WHERE ship_fee.orderid = orders.orderid
  AND ship_fee.company = 'SF';

These examples demonstrate how StarRocks combines high‑performance columnar storage with primary‑key based upserts to achieve both real‑time data ingestion and fast analytical queries.

SQLStarRocksprimary keyReal-Time UpdateAnalytical DatabasePartial Update
StarRocks
Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.