How HBase Locates Data and Manages Writes: Regions, Meta Table, and ZooKeeper
This article explains how HBase finds the correct region server for a given row key using the hbase:meta table stored in ZooKeeper, and describes the write path involving MemStore, HLog, StoreFile creation, and subsequent maintenance tasks.
Read Data
HBase tables are split into region blocks that reside on various regionservers.
To retrieve a user record with row key row0001 , the client must first locate the region containing that row.
How does HBase pinpoint the exact region on a specific regionserver?
HBase maintains an internal hbase:meta table that records detailed information for every region of every table, such as the start key, end key, and the address of the server hosting the region.
The hbase:meta table acts like a directory, enabling fast location of the actual data.
The hbase:meta table is stored in ZooKeeper , so a client first contacts ZooKeeper to obtain the meta table, queries it to find which regionserver and which region hold the target data, and then reads from that region.
Because this lookup path can be long, the client caches the retrieved location information for quicker subsequent reads.
Write Data
Write operations are assigned to the appropriate regionserver. First, recall the structure of a regionserver.
From the client’s perspective, a write is straightforward: after the write request reaches the regionserver, the modifications are first written to MemStore and HLog . Once successfully written, the client is notified of completion.
MemStore is an in‑memory cache that holds recent updates. HLog is a log file that records all update operations.
The system then periodically flushes MemStore contents to disk, creating a new StoreFile , clears the cache, and marks the corresponding entries in HLog as persisted.
This makes the data durable, but write operations introduce follow‑up issues such as growing HLog files, increasing numbers of StoreFiles, and expanding region sizes, prompting additional maintenance work:
The system regularly cleans HLog files, removing records that have already been flushed to StoreFiles.
When the number of StoreFiles exceeds a threshold, a compaction merges them into a larger file; if the merged file becomes too large, it is split again.
When a region reaches its size limit, it is split into a new region, and the HMaster manages its allocation to suitable regionservers.
After region changes, the system updates the hbase:meta table accordingly.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
