Databases 11 min read

StarRocks Beats ClickHouse, Snowflake, and Databricks in Coffee‑Shop Benchmark – Up to 10× Faster and Cheaper

A reproducible evaluation of StarRocks using the open‑source Coffee‑shop Benchmark shows that across 500 M, 1 B and 5 B row scales, StarRocks completes 17 complex join and aggregation queries 2–10× faster and with significantly lower cost than ClickHouse, Snowflake and Databricks, demonstrating superior performance and cost efficiency for analytical workloads.

StarRocks
StarRocks
StarRocks
StarRocks Beats ClickHouse, Snowflake, and Databricks in Coffee‑Shop Benchmark – Up to 10× Faster and Cheaper

Background

The Coffee‑shop Benchmark is a public test suite designed to evaluate database systems on compute‑intensive join and aggregation workloads that resemble retail and store‑operation analytics. It consists of 17 queries covering sales trends, profit analysis, discount strategies, and other typical analytical tasks.

Dataset

The benchmark uses a fact table fact_sales and two small dimension tables dim_locations (1,000 rows) and dim_products (26 rows). Fact table size varies with the scale:

500 M scale: 0.72 B rows

1 B scale: 1.44 B rows

5 B scale: 7.2 B rows

Query Types

Equality Join : fact_salesdim_locations on location_id (VARCHAR) to test distributed join and data‑sharding efficiency.

Range Join : fact_salesdim_products on name with a date‑range predicate f.order_date BETWEEN p.from_date AND p.to_date, reflecting real‑world temporal joins.

All queries also include aggregation patterns such as COUNT(DISTINCT order_id) with multi‑column GROUP BY, stressing intermediate aggregation, deduplication, and memory management.

Test Environment

Cluster configuration : Front‑End (FE) node – 1 × m7g.2xlarge (8 vCPU, 32 GB); Back‑End (BE) node – m7g.8xlarge (32 vCPU, 128 GB). Instance counts: 2–4 for 500 M/1 B scales, 8–16 for 5 B scale.

Table distribution : Order‑Key and Hash‑Bucket‑Key distribution; scripts for table creation, data loading, and query execution are provided in the StarRocks Coffee‑shop benchmark repository.

StarRocks version : 4.0.1 with default settings except joint statistics on fact_sales(order_date, location_id).

Execution methodology : Each query runs five times; the shortest execution time is recorded as the warm‑cache performance.

Results

Images illustrate total cost, total runtime, and per‑query cost/runtime for each scale.

Total Cost (500M scale)
Total Cost (500M scale)
Total Runtime (500M scale)
Total Runtime (500M scale)
Cost per Query (excluding Q10 & Q16)
Cost per Query (excluding Q10 & Q16)
Runtime per Query (excluding Q10 & Q16)
Runtime per Query (excluding Q10 & Q16)
Cost per Query (Q10 & Q16 only)
Cost per Query (Q10 & Q16 only)
Runtime per Query (Q10 & Q16 only)
Runtime per Query (Q10 & Q16 only)

Result Analysis

Across all scales, most queries finish in ~0.5 s (500 M/1 B) and ~1 s (5 B), demonstrating linear scalability.

High‑load queries Q10 and Q16, which involve 7.2 B‑row joins and high‑cardinality COUNT(DISTINCT), complete in ~10 s on the 5 B scale, a dramatic reduction in execution cost.

Overall, StarRocks achieves 2–10× better performance‑to‑cost ratios than reference systems under comparable hardware.

Further Discussion

The dimension tables are intentionally small, focusing the benchmark on fact‑table‑centric analytics. In larger industry benchmarks such as TPC‑DS, StarRocks also shows stable performance on multi‑table joins and massive aggregations.

References

https://github.com/JosueBogran/coffeeshopdatageneratorv2

https://www.linkedin.com/pulse/databricks-vs-snowflake-sql-performance-test-day-1-721m-bogran-lsboe/

https://github.com/ClickHouse/coffeeshop-benchmark

https://github.com/StarRocks/coffeeshop-benchmark

https://docs.starrocks.io/docs/benchmarking/TPC_DS_Benchmark/

StarRocksDatabase PerformanceJOIN optimizationcost efficiencyCoffee-shop Benchmark
StarRocks
Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.