Databases 21 min read

Benchmarking Cloud‑Native Data Warehouses: Cloudwave vs StarRocks Performance Test

This article compares traditional databases with modern cloud‑native data warehouses, outlines a detailed performance testing methodology using the SSB1000 benchmark, presents test scripts and environment setup for Cloudwave and StarRocks, and analyzes the results to highlight strengths and optimization opportunities.

AI Cyberspace
AI Cyberspace
AI Cyberspace
Benchmarking Cloud‑Native Data Warehouses: Cloudwave vs StarRocks Performance Test

Data Warehouse vs. Traditional Database

With the rise of 5G, IoT, and digital transformation across industries, massive amounts of data are generated, requiring efficient storage, analysis, and utilization to support business decisions. Data warehouses integrate storage, analysis, and management functions, offering data cleaning, integration, complex queries, reporting, and analytics, unlike traditional databases that focus on transaction processing.

A technical comparison shows that cloud‑native data warehouses target large‑scale storage and high‑performance analytics, use star or snowflake schemas, support horizontal scaling in minutes, handle TB‑to‑PB data volumes, and provide platform‑level management that reduces DBA workload.

Performance Testing Cases

Performance Metrics

Key metrics include read/write throughput, horizontal scalability, data consistency, fault recovery, security, cluster resource utilization, and management tool performance, with a focus on read/write performance.

Test Plan

Two Chinese cloud‑native data warehouse products were selected: Cloudwave 4.0 and StarRocks 3.0, both built on Hadoop. The SSB1000 benchmark dataset (lineorder, customer, part, supplier, dates) was used.

Test Cases

TestCase 1: Execute 13 standard SQL queries.

TestCase 2: Execute a multi‑table join (SQL1).

TestCase 3: Execute a multi‑table join (SQL2).

Test Scripts

#!/bin/bash
# Test script for Cloudwave
for ((i=1;i<20;i++))
do
  cat sql_ssb.sql | ./cplus.sh > n${i}.txt
done
#!/bin/bash
# Test script for StarRocks
for ((i=1;i<20;i++))
do
  cat sql_ssb.sql | mysql -uroot -P 9030 -h 127.0.0.1 -vvv > n${i}.txt
done

Baseline Environment

Hardware

Four Alibaba Cloud instances, each with 64 CPU cores and 256 GB memory, using ESSD PL1 high‑performance disks.

Software

JDK 19 for Cloudwave

JDK 8 for StarRocks

MySQL 8 as StarRocks FE

Hadoop 3.2.2 as distributed storage (replication factor 2)

Test Execution Steps

Cloudwave

Prepare HDFS storage and format namenode.

Start HDFS services.

Create upload directory and put SSB1000 data.

Start Cloudwave cluster.

Load data with ./cplus_go.bin -s 'loaddata ssb1000' (≈58 min).

Run TestCase 1, monitor CPU usage, and calculate average latency (≈7.6 s).

Run TestCase 2 and 3, record CPU usage (≈0.09 % and 0.12 %) and latencies (12 ms, 14 ms).

StarRocks

Clear HDFS storage.

Start StarRocks FE daemon.

Add four BE nodes.

Start BE daemons and verify all nodes are alive.

Create tables and load SSB1000 data (≈112 min, no compression, ~1 TB storage).

Run TestCase 1, CPU usage average 67 % and latency 10.39 s.

Run TestCase 2 and 3, CPU usage 78.7 % and 90.5 % with latencies 2.79 s and 4.8 s respectively.

Result Analysis

Cloudwave shows superior performance, especially in multi‑table join scenarios, with low CPU utilization and fast execution. However, the study covers only a limited set of metrics and scenarios; further optimization and broader testing are needed for comprehensive evaluation.

From Data Warehouse to Cloud‑Native Data Warehouse

Modern data warehouses have evolved from pure storage systems to platforms offering data services, SDKs, and cloud‑native features such as elastic scaling, parallel full‑text search, and rich ecosystem integration.

Key cloud‑native advantages include elastic scalability, agility for new analytics workloads, and strong ecosystem support for machine learning, visualization, and other cloud services.

cloud-nativeSQLPerformance TestingData Warehouse
AI Cyberspace
Written by

AI Cyberspace

AI, big data, cloud computing, and networking.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.