Big Data 19 min read

Design and Implementation of an SSD‑Based Application‑Layer Cache Architecture for Kafka in Meituan Data Platform

Meituan built an SSD‑based application‑layer cache for Kafka that bypasses PageCache contention between real‑time and delayed jobs, classifies log segments across SSD and HDD, limits flush rates, and achieves up to 80% latency reduction while guaranteeing stable real‑time consumption.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Design and Implementation of an SSD‑Based Application‑Layer Cache Architecture for Kafka in Meituan Data Platform

Kafka plays a critical role in Meituan's data platform as a unified data cache and distribution system. Because many real‑time jobs share the same PageCache, competition for PageCache resources can cause latency spikes for real‑time workloads when delayed jobs trigger unexpected disk reads.

The article introduces a self‑developed SSD‑based application‑layer cache architecture for Kafka that eliminates PageCache contention and improves service quality for real‑time consumption.

Current Kafka Situation

Meituan operates a massive Kafka deployment (6000+ nodes, 100+ clusters, >6 × 10⁴ topics, >4 × 10⁵ partitions) handling up to 8 × 10¹³ messages per day with peak throughput of 1.8 × 10⁸ msgs/s. Real‑time jobs and delayed jobs compete for PageCache, leading to degraded latency and throughput.

Pain‑Point Analysis & Goals

Real‑time and delayed jobs compete for PageCache, causing unexpected disk reads for real‑time jobs.

HDD performance drops sharply under high read concurrency.

About 20% of jobs experience delayed consumption.

Goal: guarantee that real‑time consumption is not affected by delayed consumption and provide stable QoS.

Why SSD?

Two directions were considered: (1) eliminate PageCache competition (e.g., avoid writing delayed data back to PageCache) and (2) introduce a faster device between HDD and memory. The second direction was chosen because PageCache policies are hard to modify and memory is costly. SSD offers orders‑of‑magnitude higher IOPS and bandwidth, making it suitable as an intermediate cache.

Architecture Decision

Two candidate solutions were evaluated:

Kernel‑level caching (FlashCache, OpenCAS, etc.) – limited by kernel version and not fully aligned with Kafka’s access patterns.

Application‑layer caching – integrates directly with Kafka’s read/write semantics.

The application‑layer approach was selected.

New Architecture Overview

LogSegments are classified by time into three states:

OnlyCache : stored exclusively on SSD.

Cached : synchronized from HDD to SSD (inactive segments).

WithoutCache : resides only on HDD.

Background threads periodically sync inactive segments to SSD, move aged segments back to HDD, and enforce space thresholds.

Read path: if a requested offset belongs to a Cached or OnlyCache segment, data is served from SSD; otherwise it falls back to HDD. Write path still writes to PageCache first, then flushes to SSD when thresholds are met.

Key Optimizations

LogSegment Synchronization

Two aspects are tuned: synchronization method (active vs. inactive) and rate‑limiting to avoid overwhelming HDD or SSD during bulk transfers.

Append Flush Strategy

Original Kafka flushes based on message count (e.g., 50 000 msgs). The new design limits flush rate to 2 MB/s per segment, smoothing I/O spikes. The flush call is performed via fileChannel.force.

Evaluation

Four clusters were built: new SSD‑cache architecture, plain HDD, FlashCache, and OpenCAS (each with three nodes). Tests measured write/read latency, HDD/SSD hit rates, and SLA impact on real‑time jobs under delayed‑consumption scenarios.

Results:

Before flush‑rate optimization, the SSD‑cache cluster consistently outperformed other solutions.

After optimization, advantages remained significant under high write loads (>170 MB/s).

Delayed jobs never impacted real‑time jobs in the SSD‑cache cluster.

Conclusion & Future Work

The SSD‑based application‑layer cache reduces read/write latency by up to 80% and isolates real‑time consumption from delayed consumption. The design is in gray‑scale rollout and will be contributed to the Kafka community. Future work includes broader deployment to high‑priority clusters.

Authors

Shi Ji and Shi Lu, engineers in Meituan's data platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataReal-time ProcessingKafkaLogSegmentSSD Cache
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.