Practical Deployment and Operation Guide for StarRocks OLAP Database
This article presents a comprehensive overview of StarRocks, covering its key features, deployment challenges, backup and synchronization methods, cluster configuration and upgrade procedures, as well as monitoring and alerting solutions, followed by practical lessons learned from real‑world usage.
1. Introduction to StarRocks
Our DBA team manages MySQL, Redis, and Elasticsearch, and needed a database capable of handling complex OLAP queries on billions of rows with sub‑second multi‑dimensional aggregation while keeping transformation costs low.
MySQL lacks cross‑instance queries, leading to complex logic and excessive storage; Redis cannot handle complex conditions and is costly in memory; Elasticsearch requires heavy refactoring.
After evaluating products on performance, scale, architecture, extensibility, development and operation costs, we selected StarRocks for its high OLAP performance, low hardware cost (mixed deployment), low migration effort (SQL compatibility), and low operational overhead (open source, few components).
2. Characteristics of StarRocks
StarRocks is a high‑performance distributed database designed for various analytical scenarios, combining MPP architecture with vectorized execution, intelligent query optimization, materialized views, federated queries, efficient updates, and standard SQL support. It natively supports both streaming and batch data processing, high availability, and easy node scaling.
Key concepts:
MPP database: Optimized for analytical workloads, typically column‑oriented, storing each column as a separate object.
Vectorized engine: Executes the same instruction on a vector of data in parallel, leveraging CPU SIMD to process multiple operands simultaneously.
Distributed system: Solves massive data storage by allowing nodes to scale horizontally based on load.
StarRocks cluster components:
FE (Frontend): Manages metadata, client connections, query planning and scheduling.
BE (Backend): Handles data storage, computation, compaction, and replica management.
Broker: Provides a bridge to external data sources such as HDFS or object storage for import/export.
3. Issues Encountered During Deployment
While following the manual deployment steps, we faced several problems:
3.1 Permission Management: Starting the Broker component as root caused HDFS permission errors, e.g.,
Permission denied: user=root, access=READ, inode="/user/hive/.../part-00005-...c000":hive. The root account lacks the necessary Hive permissions because the Broker pulls data from Hive using the launching user’s credentials. Running Broker with a regular user resolved the issue; FE/BE have no such restriction.
3.2 Parameter Optimization: Default configuration values were insufficient for our workload. Some parameters require a full component restart to take effect, even if they appear set via SET GLOBAL. Notable adjustments include:
FE connection limits: qe_max_connection=10000, max_conn_per_user=2000.
Stream load tuning: streaming_load_rpc_max_alive_time_sec=4800 (default 1200), tablet_writer_open_rpc_timeout_sec=240 (default 60), max_routine_load_task_num_per_be=9 (default 5).
These changes dramatically reduced data import errors.
3.3 Syntax Compatibility: Although StarRocks is largely MySQL‑compatible, certain DDL and index features differ. For example, table creation must explicitly specify replica count, and StarRocks offers Bitmap, BloomFilter, and sparse indexes, which can outperform MySQL B‑tree indexes when properly tuned.
Partitioning support is limited; RANGE partitions must be created manually because automatic time‑based partition creation is not yet available.
4. Data Backup and Synchronization
We operate two clusters and need to sync data between them. StarRocks provides BACKUP / RESTORE commands based on snapshots. During backup, writes are not captured, so we either pause writes or perform a second sync after restore.
Backup targets include HDFS and AWS S3; due to SDK incompatibility we deployed a temporary local HDFS service, which proved the cheapest and most efficient solution.
Key notes: BACKUP does not support cross‑database backups; each database must be backed up separately.
Restore time for billion‑row tables is about one minute, over 70% faster than MySQL mysqldump.
5. Cluster Configuration and Upgrade
Production clusters typically run three nodes (mixed FE/BE) with 16 CPU, 64 GB RAM, 300 GB storage. Horizontal scaling is possible by adding nodes.
Version 1.18 cannot balance data across BE nodes, leading to disk bottlenecks. Upgrading to 1.19 resolves the imbalance issue.
Upgrade steps:
Backup BE/FE configuration files and meta data.
Upgrade BE nodes first, then FE nodes (non‑master FE first, master last).
Verify version via SHOW VARIABLES LIKE "%version%" (note that SELECT VERSION() may be inaccurate).
Run regression tests to ensure business functionality.
Prepare rollback plan in case of failure, especially to avoid double‑writes during rollback.
6. Monitoring and Alerting Solution
The official recommendation is Prometheus + Grafana. The provided Grafana dashboard covers only a few metrics with alerting support, but the HTTP API allows custom monitoring platforms to pull cluster metrics and define alerts.
We integrated StarRocks into our existing MySQL‑based operations platform, adapting topology checks and SQL analysis modules to handle StarRocks, thereby extending the platform’s capabilities for query, authorization, and deployment workflows.
7. Practical Summary
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Dada Group Technology
Sharing insights and experiences from Dada Group's R&D department on product refinement and technology advancement, connecting with fellow geeks to exchange ideas and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
