Understanding HBase HLog and Fault Recovery Mechanisms
This article explains HBase's write path using Memstore and HLog, details the lifecycle of HLog including construction, rolling, expiration, and deletion, and thoroughly analyzes the three fault‑recovery models—Log Splitting, Distributed Log Splitting, and Distributed Log Replay—highlighting their processes, advantages, and configuration nuances.
HBase uses a LSM‑like architecture where writes are first stored in a Memstore cache and later flushed to disk; before caching, each write is sequentially appended to the HLog to prevent data loss if a RegionServer crashes, enabling recovery through log replay.
HLog Overview – An HLog consists of log entries from all Regions on a RegionServer, each entry being a <HLogKey, WALEdit> pair. The HLogKey includes sequence ID, write time, cluster ID, region name, and table name; the sequence ID is monotonically increasing, allowing identification of the newest entries.
HLog lifecycle includes construction, periodic rolling (controlled by hbase.regionserver.logroll.period ), expiration (when the maximum sequence ID in a file has been flushed to disk), and deletion (handled by the HMaster based on TTL settings such as hbase.master.logcleaner.ttl ).
HBase Fault Recovery Process – When a RegionServer fails (e.g., due to Full GC, network issues, or bugs), Zookeeper detects the loss via missed heartbeats and notifies the HMaster. The HMaster reassigns the affected Regions to other servers and uses HLog replay to restore lost data. Because a single HLog contains entries for many Regions, the log must be split by Region before replay.
Log Splitting (LS) – In the original implementation, the HMaster controls the split: it renames the log directory, reads each <HLogKey, WALEdit> sequentially, buffers entries per Region, and writes each buffer to HDFS under /hbase/table_name/region/recovered.edits/.tmp . This method is simple but inefficient, especially for large clusters.
Distributed Log Splitting (DLS) – The Master publishes split tasks to ZooKeeper ( /hbase/splitWAL ) as TASK_UNASSIGNED . RegionServers register as workers, claim tasks, read log entries, buffer them per Region, and write the buffers back to HDFS. This parallel approach dramatically reduces recovery time but creates many small files (M × N, where M is the number of HLogs and N is the number of Regions on a failed server).
Distributed Log Replay (DLR) – DLR modifies the workflow: after Regions are reassigned, the HLog is split into Region buffers but instead of writing small files, the buffers are replayed directly. This avoids the I/O overhead of creating numerous temporary files, offering faster write‑availability recovery while maintaining correct ordering using sequence IDs and replay tags (enabled by HFile V3 tags).
Configuration notes: DLR is available from HBase 0.95, became default in 0.99, and was disabled by default in 1.1 due to edge‑case bugs. It can be enabled with hbase.master.distributed.log.replay=true and requires HFile V3.
Conclusion – The article provides a comprehensive overview of HLog structures, the three fault‑recovery strategies (LS, DLS, DLR), their trade‑offs, and practical configuration guidance for reliable HBase operation.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.