Databases 11 min read

Enhancing HBase CAP Model and MTTR with Kafka‑Based IO Decoupling and Native AP Support

The article analyzes HBase's CP‑oriented CAP limitations, proposes native AP support via Replica, decouples WAL IO to Kafka, optimizes MTTR, introduces multi‑datacenter active/active disaster recovery, and redesigns client write paths and LogSplit processing for higher availability and throughput.

Big Data Technology Architecture

Apr 29, 2020

Enhancing HBase CAP Model and MTTR with Kafka‑Based IO Decoupling and Native AP Support

HBase primarily follows a CP model, providing strong consistency for writes but suffering long MTTR when a RegionServer crashes, due to slow node‑failure detection (ZK session timeout) and costly LSM‑tree log replay.

To address AP‑oriented workloads that tolerate delayed visibility but demand high availability, HBase offers a native Replica feature that adds extra read‑only Region replicas; however, this adds IO overhead, only protects reads, and does not help cross‑datacenter disaster recovery.

The proposed solution decouples WAL‑related IO from the HBase cluster by offloading it to Kafka, reducing overall disk IO pressure and allowing both WAL writes and Replica synchronization to be handled by Kafka.

MTTR is further reduced by routing client writes through a SDK that first records WAL entries to Kafka; the client then relies on Kafka's faster failure detection and ISR‑based failover, eliminating the long HBase log‑replay phase.

For multi‑datacenter disaster recovery, independent HBase clusters are deployed per site and synchronized via native Replication; by switching from Active/Standby to Active/Active mode and leveraging Kafka‑based WAL storage, cross‑site failover becomes more seamless.

Client dual‑write is enabled through a CompositeConnection that holds two separate HBase connections; write requests are sent to both clusters (invokeAll) while read requests use invokeAny, improving SLA for eventually consistent workloads.

Kafka‑based log replay replaces the native LogSplit mechanism: the offset of each WAL entry becomes the Kafka offset, which is propagated from the client to RegionServers via Mutation attributes, allowing custom SplitLogManager, TaskFinisher, and TaskExecutor implementations that operate on Kafka partitions instead of HDFS files.

Overall, the architecture redesign reduces IO consumption, shortens MTTR, supports AP scenarios, and provides flexible active/active multi‑datacenter resilience.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

kafka Database Architecture HBase replication MTTR CAP IO Decoupling

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.