Cloud Native 17 min read

How Tencent Cloud Implements Tiered Storage for Kafka: Architecture, Challenges, and Evolution

This article examines the challenges of Kafka's traditional architecture, explains why local‑state heavy deployments cause operational difficulty and resource waste, and details Tencent Cloud's elastic, storage‑compute‑separated designs—including tiered storage, segment state machines, offset constraints, and performance optimizations—while sharing practical implementation insights and future directions.

Tencent Cloud Middleware

Dec 12, 2023

How Tencent Cloud Implements Tiered Storage for Kafka: Architecture, Challenges, and Evolution

Introduction

Lu Shilin, the core maintainer of Tencent Cloud Message Queue Kafka, presented the practice and evolution of tiered storage in Tencent Cloud, covering four aspects: problems and challenges of Kafka architecture, elastic architecture comparison, tiered storage architecture and principles, and Tencent Cloud's concrete implementation.

Problems and Challenges of Kafka Architecture

Typical Kafka deployments use Zookeeper or KRaft for metadata, physical machines or VMs for compute, and local disks for storage. This model suffers from three main issues:

Heavy local state: data resides locally, making any operation require data migration and increasing operational complexity.

Resource waste: scaling is performed at the broker level, but resources (CPU, bandwidth, disk) are allocated per node, leading to inefficient utilization.

Historical data processing: large‑scale data back‑fill pollutes the page cache and degrades overall read/write SLA.

Operational Difficulty

Data migration is required in scenarios such as uneven data distribution among nodes, node‑level resource bottlenecks (CPU, disk, bandwidth), and large volumes of hot data that slow down reads/writes during migration.

Resource Waste

Resource bottlenecks stem from CPU compression overhead (e.g., Gzip, Snappy, Zstd), message format conversion (V0‑V2), disk space consumption for massive cold data, disk I/O limits during tail‑read of historical data, and HDD throughput constraints. These factors cause different resource pressures depending on workload.

Elastic Architecture Comparison

Storage‑Compute Separation Architecture

This design, similar to Pulsar’s architecture, uses a storage layer (e.g., HDFS, COS, S3) decoupled from compute nodes. A Proxy handles unified access, service discovery, and rate limiting; Brokers hold partitions and balance load across multiple storage back‑ends; the storage layer can be multi‑modal.

Advantages:

Node expansion does not require data migration.

Compute and storage can be scaled independently.

Drawbacks:

File‑switching (e.g., HDFS or BookKeeper) introduces lease recovery glitches.

Strong dependency on the external storage system can cause service outage if the storage fails.

Elastic Local Storage Architecture

Combines cloud disks with cloud hosts, using automated operations to monitor disk usage (e.g., LVM expansion) and hot‑migration of compute resources (CVM or containers) when CPU or memory become bottlenecks. It supports vertical scaling but cannot address Kafka’s need for horizontal scaling.

Elastic Remote Storage Architecture

Local storage handles hot data with low latency, while remote storage (e.g., COS) stores cold data. Benefits include consistent write latency, fallback to local storage during remote failures, and cost reduction through cheaper remote storage.

Tiered Storage Architecture

Read/Write Flow

Producers write to cloud disks; data is asynchronously synced to remote storage. Consumers first read from local storage; if data is absent, they fetch from remote storage based on offset and read strategy.

Data Lifecycle

Only inactive segments are uploaded to remote storage. A LocalRetention parameter controls how long uploaded segments are kept locally, while the standard Retention parameter governs remote data expiration.

Offset Constraints

Key offset definitions:

Lz – Local log end offset (latest local offset).

Ly – Last stable offset (consumer‑visible offset).

Ry – Remote log end offset.

Lx – Local log start offset.

Rx – Remote log start offset.

Constraints: Lz ≥ Ly ≥ Lx and Ly ≥ Ry ≥ Rx.

Segment State Machine

Segments transition through three dimensions:

CopySegment: CopySegmentStarted → CopySegmentFinished DeleteSegment: DeleteSegmentStarted → DeleteSegmentFinished DeletePartition:

DeletePartitionMarked → DeletePartitionStarted → DeletePartitionFinished

State transitions must follow defined order; for example, CopySegmentStarted → DeleteSegmentStarted is illegal without completing the copy phase first.

Implementation in Tencent Cloud

Segment Metadata Management

Metadata is synchronized via an internal inner‑topic (WAL). Brokers consume the WAL to build in‑memory state machines, periodically persist snapshots and offsets locally, and recover state using snapshots plus incremental WAL.

Consumption Performance

While write performance matches native Kafka, reading from remote COS can become a bottleneck. Tencent Cloud improves read throughput by pre‑loading messages into memory pools and proactively downloading hot data during idle periods.

Isolation and Resource Controls

To ensure stability when third‑party storage is involved, the system isolates resources:

Dedicated I/O disks to reduce interference.

CPU core pinning and thread isolation.

Bandwidth throttling and parallelism control for upload/download tasks.

Off‑heap memory usage and ByteBuffer reuse.

Rollback Mechanisms

Features include pausing tiered uploads, pausing remote downloads on demand, automatic cloud‑disk expansion, and topic/cluster‑level rollback support.

Future Outlook

Plans involve richer schema support (Protobuf, JSON), enhanced ingestion layer with stateless horizontal scaling, compute engine for format conversion (e.g., Parquet, Hudi, Delta Lake), and multi‑modal storage exploration.

Conclusion

Tencent Cloud’s tiered storage solution decouples compute and storage, introduces flexible data lifecycle management, and provides elastic scaling while addressing the operational and resource challenges of traditional Kafka deployments.

distributed-systems cloud-native Kafka Tencent Cloud Elastic Architecture Data Lifecycle

Written by

Tencent Cloud Middleware

Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.