Databases 11 min read

Unlock 30% Faster Queries: StarRocks on AWS Graviton3 Performance Deep Dive

This article examines how StarRocks, a next‑generation MPP database, leverages AWS Graviton3 instances to achieve over 30% query speed improvement and 15% cost reduction compared with x86 C6i instances, detailing benchmark methodology, hardware specs, SIMD optimizations, and real‑world OLAP results.

StarRocks
StarRocks
StarRocks
Unlock 30% Faster Queries: StarRocks on AWS Graviton3 Performance Deep Dive

Background

StarRocks is a next‑generation MPP (Massively Parallel Processing) database that uses a fully vectorized engine and a cost‑based optimizer (CBO) to achieve sub‑second query latency, especially for multi‑table joins.

AWS Graviton3

Graviton3 is an ARM‑based AWS processor that delivers roughly 25 % higher single‑thread performance and about 50 % higher overall performance than Graviton2. It features DDR5 memory, higher bandwidth, lower latency, enhanced matrix‑multiply instructions (up to 3× acceleration for ML workloads), built‑in memory encryption and improved energy efficiency.

Test Environment

Two EC2 instance types were compared:

c7g.4xlarge (Graviton3)

On‑demand price: $0.5781 USD/h

CPU: 16 × ARM vCPUs @ 2.6 GHz

Memory: 32 GB

Network: 15 Gbps

Cache: L1d 1 MiB × 16, L1i 1 MiB × 16, L2 16 MiB × 16, L3 32 MiB × 1

c6i.4xlarge (Intel Ice Lake)

On‑demand price: $0.68 USD/h

CPU: 16 × x86 vCPUs @ 2.9 GHz

Memory: 32 GB

Network: 12.5 Gbps

Cache: L1d 384 KiB × 8, L1i 256 KiB × 8, L2 10 MiB × 8, L3 54 MiB × 1

Both instances ran StarRocks version 3.3 (release notes: https://docs.starrocks.io/releasenotes/release-3.3/). Configuration tweaks for the benchmark were:

fe.conf: catalog_trash_expire_second (default)
be.conf: max_compaction_concurrency=0
be.conf: trash_file_expire_time_sec=0

Benchmark Methodology

The TPC‑DS benchmark was executed at 100 GB and 1 TB scales. Because Graviton3’s ARM SIMD instruction set differs from x86, additional SIMD adaptations were applied to StarRocks. Relevant pull requests are #44607 (bitshuffle and CRC NEON optimizations) and #44194 (filter_range NEON optimization). The metric reported is the sum of latencies of all 99 queries in each suite.

Results

Across both scales the c7g instance showed no performance regressions relative to c6i and delivered an average 30 % speedup. Combined with the 15 % lower hourly price, the overall cost‑performance improvement exceeds 50 %.

Key OLAP workload gains after SIMD optimizations:

Scan & Bitshuffle: >15 % faster

Aggregate: 43 % faster

HashJoin: >15 % faster

Cost‑performance ratio calculation:

Performance increase = 1.30, price reduction = 0.85 → 1.30 / 0.85 ≈ 1.53, i.e., a 53 % overall gain.

Instance configuration diagram
Instance configuration diagram
Performance comparison chart
Performance comparison chart
OLAP workload improvement table
OLAP workload improvement table

Conclusion

StarRocks on Graviton3 (c7g) provides >30 % query speed improvement and ~15 % cost reduction versus Ice Lake (c6i), yielding more than a 50 % boost in price‑performance. Future work includes evaluating Graviton4, which promises up to 30 % additional performance and significant memory‑bandwidth gains.

References

StarRocks 3.3 release notes: https://docs.starrocks.io/releasenotes/release-3.3/

StarRocks TPC‑DS benchmark documentation: https://docs.starrocks.io/zh/docs/benchmarking/TPC_DS_Benchmark/

SIMD and vectorization guide for AWS Graviton: https://github.com/aws/aws-graviton-getting-started/blob/main/SIMD_and_vectorization.md

Pull request #44607 (bitshuffle & CRC NEON optimizations): https://github.com/StarRocks/starrocks/pull/44607

Pull request #44194 (filter_range NEON optimization): https://github.com/StarRocks/starrocks/pull/44194

StarRocksPerformance BenchmarkMPP databaseAWS Graviton3
StarRocks
Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.