How StarRocks Handles PB‑Scale Real‑Time Analytics with High Availability
This article explains how StarRocks manages petabyte‑level user behavior logs, ads and orders through a shared‑nothing architecture, tablet‑based data distribution, MPP compute, high‑availability metadata, real‑time mini‑batch ingestion, and online schema changes, enabling 24/7 analytical services for diverse internet companies.
Internet companies A, B, and C have different business models—online advertising, mobile app services, and e‑commerce—but all need real‑time analysis of massive data streams. They require a system that can store petabytes of logs, provide continuous 7×24 availability, and tolerate hardware failures.
System Architecture
StarRocks adopts a shared‑nothing design where each node stores both data and compute resources, enabling local processing and minimizing network traffic. Data is distributed across roughly 100 nodes for a 1 PB dataset, using hash‑based tablet partitioning. The front‑end (FE) layer manages metadata and query planning, while back‑ends (BE) store tablets and execute MPP (Massively Parallel Processing) queries.
Each table is divided into partitions and further into tablets; for example, a table bigdata is partitioned by date range and hashed into 100 tablets. Tablets are the smallest management unit and can be dynamically moved across BE nodes for load balancing and fault tolerance.
High‑Availability Metadata and Data
Metadata (schema, tablet locations, node status) is stored in an embedded BDB‑JE database with Raft‑like replication (default three replicas). Writes succeed only after a majority acknowledge, ensuring consistency. Tablet data also has three replicas; ingestion writes to all replicas, and FE coordinates tasks without requiring a separate Raft protocol among BEs. If a node fails, FE schedules clone tasks to restore missing replicas, keeping the cluster balanced.
Real‑Time Data Ingestion
StarRocks supports minute‑ or second‑level ingestion, especially from Kafka, using mini‑batch processing. Mini‑batches improve throughput and provide transactional guarantees via a two‑phase commit and MVCC versioning, allowing concurrent queries to see a consistent snapshot.
MPP Compute Engine
Queries such as SORT, GROUP BY, and JOIN are executed in parallel across nodes, leveraging data locality of tablets. Compared with scatter‑gather systems like ClickHouse, StarRocks’ MPP engine reduces bottlenecks and better utilizes cluster resources.
Online Schema Change
Schema changes occur without downtime through a three‑stage process (t1‑t2‑t3): the request is received, ongoing ingestion is split between old and new tables, and finally the new table replaces the old one atomically. Queries continue on the original table until the swap completes.
Conclusion
The article highlights StarRocks’ capabilities for managing massive, real‑time analytical workloads with high availability, automatic tablet balancing, MPP processing, robust ingestion, and seamless schema evolution. While only a subset of its features are covered, additional strengths such as columnar storage, materialized views, compaction, and cloud‑native separation are noted for future discussion.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
StarRocks
StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
