Big Data 10 min read

How TDH Dominated the TPCx‑HS 10TB Benchmark: Strategies and Results

The article details how StarRocks and Cisco’s joint TPCx‑HS 10TB benchmark placed the TDH platform at the top of the performance ranking, explains the test setup, describes the pre‑ and post‑optimization strategies for TeraGen and TeraSort, and outlines the hardware configuration and key tuning parameters.

StarRing Big Data Open Lab
StarRing Big Data Open Lab
StarRing Big Data Open Lab
How TDH Dominated the TPCx‑HS 10TB Benchmark: Strategies and Results

Recently, StarRocks and Cisco performed a TPCx‑HS 10TB standard test on the TDH series. The test passed the official TPC review and achieved the highest HSph score of 12.18, beating MapR 5.0, CDH 5.4.2 and MapR M5 4.0.1.

The benchmark used the Cisco UCS integrated infrastructure for big data to build a one‑stop TDH platform.

Key metrics recorded in the report include HSph (system performance score) and Scale Factor (data volume).

TPCx‑HS Overview

TPCx‑HS is a Hadoop performance benchmark provided by the TPC organization. It evaluates hardware, software, and Hadoop‑compatible file system APIs for performance, cost‑effectiveness, availability, and power consumption. The benchmark essentially runs a TeraSort on terabyte‑scale data to test HDFS and MapReduce processing.

The benchmark consists of three phases:

TeraGen : generates massive data and stores it in HDFS (Map only).

TeraSort : reads the data, sorts it with MapReduce, and writes the result back to HDFS (Map and Reduce).

Validate : verifies the sorted output (Map and Reduce).

Pre‑Optimization Test

Without any tuning, the three phases showed:

TeraGen: low CPU load in the Map stage, network became the bottleneck.

TeraSort: CPU fully utilized in the Map stage, network fully utilized in the Reduce stage.

Validate: stable performance, no obvious tuning points.

Thus the main tuning targets were TeraGen and TeraSort.

TeraGen Tuning Approach

The bottleneck was network throughput caused by writing three HDFS replicas. The tuning principles were:

Increase the slope of the left side of the performance curve – start tasks quickly.

Keep the top of the curve flat – maintain stable, full‑load network.

Avoid a trailing right side – balance task load and avoid oversized tasks.

Key actions included increasing node heartbeat frequency and selecting an appropriate block size based on file count, Map count, and task execution time.

TeraSort Tuning Approach

TeraSort’s Map stage had high CPU load but low network load, while the Reduce stage had low CPU load but high network load, leading to idle waiting. The tuning ideas were:

Compress and broadcast intermediate shuffle data to reduce Map‑to‑Reduce I/O.

Balance the number of Map and Reduce tasks to increase overlap.

Limit Reduce tasks to avoid excessive resource consumption.

After optimization, CPU and network loads became more balanced.

Map/Reduce Task Numbers

The number of Map and Reduce tasks heavily influences performance. For a 10 TB dataset, the relationship among file count, split count, block size, and task numbers is critical. Example calculations show how to derive NUM_MAPS and NUM_REDUCERS based on the chosen block size.

Hardware Configuration

The test used a dual‑socket 14‑core Intel Xeon E5‑2680 v4 CPU, 256 GB memory, dual 10 GbE network, and 24 × 1.2 TB disks. To achieve optimal performance, the CPU must run in Performance mode, the network must operate at full speed, and sufficient disk throughput must be ensured.

Conclusion

Beyond the parameter adjustments described, additional tuning of memory, GC, and other settings was performed. The results demonstrate TDH’s strong big‑data analytics capability and its effective optimization potential, confirming the platform’s excellence and paving the way for the upcoming TDH 5.0 release.

Performance optimizationbig dataHadoopTDHTPCx-HS
StarRing Big Data Open Lab
Written by

StarRing Big Data Open Lab

Focused on big data technology research, exploring the Big Data era | [email protected]

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.