Understanding HBase RegionServer, HRegion, HStore, and Column Family Management
The article explains HBase's RegionServer management of regions and stores, detailing HStore composition, MemStore flushing, split conditions, column family sharing within regions, and the performance implications of multiple column families, recommending a single column family design for optimal I/O efficiency.
HRegionServer internally manages a series of HRegion objects, each corresponding to a region in a table; each HRegion consists of multiple HStore instances.
Each HStore corresponds to a column family in the table, acting as a dedicated storage unit, so column families with similar I/O characteristics should be grouped together for maximum efficiency.
HStore storage is the core of HBase storage and comprises two parts: MemStore and StoreFile. MemStore is a sorted memory buffer where incoming writes are first stored; when MemStore fills up, it is flushed to a StoreFile (implemented as an HFile).
The condition for splitting a region is that the largest StoreFile among all stores in the region exceeds a predefined threshold.
At the file level, different column families are stored in separate files, but multiple column families can share the same region.
For example, the following paths show two different column families sharing the same region /hbase/zz/3917ebd872c0adcb9d6c5a9cfd30b87f/a and /hbase/zz/3917ebd872c0adcb9d6c5a9cfd30b87f/b.
Because column families share a region, a situation may arise where one column family contains millions of rows while another has only a few; when a region split is triggered, the small column family is also split across many regions, leading to a cardinality problem and degraded scan performance.
Additionally, flushing one column family can cause neighboring column families to flush due to coupling effects, increasing I/O.
Therefore, it is generally recommended not to define multiple column families in a table.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
