Big Data 14 min read

How We Cut Hudi Data Lake Write Costs by Over 85% with Custom Architecture

This article examines the challenges of using Apache Hudi for real‑time data lake writes, analyzes the COW and MOR write models, and presents a custom master‑worker architecture with index optimization and repartitioning that reduces write resource consumption by over 85% while boosting throughput up to 300‑fold.

Xingsheng Youxuan Technology Community
Xingsheng Youxuan Technology Community
Xingsheng Youxuan Technology Community
How We Cut Hudi Data Lake Write Costs by Over 85% with Custom Architecture

1. Background

Hudi, as a mainstream data‑lake format, provides ACID transactions and real‑time read/write capabilities, enabling stream‑batch integration. In our data‑warehouse architecture, Hudi serves as the primary storage for the real‑time warehouse (see Figure 1).

Data from the ODS layer is streamed via Kafka and written to Hudi using Spark Streaming, the community‑recommended ingestion method.

However, with nearly 1,000 business tables using Spark Streaming, the compute cost exceeds 90,000 CNY per month. Reducing Hudi write resource consumption became a critical, practical challenge.

2. Hudi Write Principles

Hudi offers two write modes: Copy‑on‑Write (COW) and Merge‑on‑Read (MOR). COW copies existing data, merges it with new data, and writes a new version, leading to write amplification especially for small, frequent writes. MOR splits incoming records into inserts and updates, writing them to base and log files respectively, which reduces write amplification but still requires costly tag (key‑lookup) operations.

3. Hudi Write Architecture Optimization

We designed a master‑worker architecture where the master distributes tasks and handles fault tolerance, while workers perform data ingestion and Hudi writes.

Data is ingested either directly from clients or via Kafka consumers.

During write, the target partition is identified and the corresponding index determines whether the record is an insert or an update; inserts go to base files, updates to log files, and index metadata is written to WAL.

When data volume or time thresholds are reached, in‑memory data is flushed to the lake: inserts to base files, updates to log files.

Indexes use BloomFilter + RocksDB per partition and are retained for a limited period (e.g., one week).

Replacing Spark/Flink with Java threads for small tables saved substantial resources, and index optimizations accelerated the tag process.

We further introduced a repartition strategy based on hash partitioning: incoming records are routed to specific base files, eliminating the need for full scans. After repartition, all data is written to log files, and indexes can be dropped, dramatically cutting resource usage.

Figures 2‑10 illustrate the original COW/MOR flows, the optimized master‑worker architecture, index construction, and the repartition process.

4. Evaluation and Outlook

The service‑oriented Hudi solution reduced resource consumption by over 85%, achieving throughput improvements from 28 KB/s/1C3G to 10 MB/s/1C1G (more than 300×). Index removal after repartition saved >95% of index‑related resources. Overall, the redesign delivers multi‑million‑yuan annual cost savings and paves the way for further optimizations in compute and query layers, including integration with SparkSQL, Presto, and the upcoming ShapleyDB engine for real‑time lake‑warehouse capabilities.

big datadata lakeSparkHudiCOWMORWrite Optimization
Xingsheng Youxuan Technology Community
Written by

Xingsheng Youxuan Technology Community

Xingsheng Youxuan Technology Official Account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.