Mastering HBase Ops: Essential Tools and Commands for Cluster Management
This guide introduces the most commonly used HBase operational tools—including Canary, hbck, HFile viewer, CopyTable, Export/Import, ImportTsv, CompleteBulkload, RowCounter, CellCounter, and clean utilities—explaining their purposes, typical use‑cases, and exact command syntax for effective cluster administration.
Canary
HBase Canary is a tool for checking the health of an HBase cluster at the column‑family, Region, or RegionServer level by fetching a row from each Region of a specified table and reporting failures or latency.
Check whether all Regions in the cluster are reachable:
sudo -u hbase hbase org.apache.hadoop.hbase.tool.Canary sudo -u hbase hbase org.apache.hadoop.hbase.tool.Canary -t 60000-t <N> sets the timeout in milliseconds (default 600000).
Check all Regions of specific tables:
sudo -u hbase hbase org.apache.hadoop.hbase.tool.Canary table_name1 table_name2 ... sudo -u hbase hbase org.apache.hadoop.hbase.tool.Canary -t 60000 table_name1 table_name2 ...Check the status of RegionServers:
sudo -u hbase hbase org.apache.hadoop.hbase.tool.Canary -regionserver sudo -u hbase hbase org.apache.hadoop.hbase.tool.Canary -t 60000 -regionserverhbck Tool
The hbck utility checks HBase cluster consistency. sudo -u hbase hbase hbck The command prints OK if the cluster is consistent, or INCONSISTENCY if problems are detected. Use the -details flag for more information. Re‑run if the inconsistency may be temporary (e.g., during startup or region split).
HFile Viewer
To view the textual representation of an HFile, use the HFile tool:
${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.io.hfile.HFileExample:
sudo -u hbase hbase org.apache.hadoop.hbase.io.hfile.HFile -v -f hdfs://10.81.47.41:8020/hbase/TEST/1418428042/DSMP/4759508618286845475The -v flag shows detailed content; omit it for a summary.
CopyTable
CopyTable copies all or part of a table, either within the same cluster or across clusters.
sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.CopyTable --helpUsage example:
hbase org.apache.hadoop.hbase.mapreduce.CopyTable -Dhbase.client.scanner.caching=100 -Dmapred.map.tasks.speculative.execution=false --startrow=rk1 --stoprow=rk4 --starttime=1265875194289 --endtime=1265878794289 --peer.adr=transwarp-perf1,transwarp-perf2,transwarp-perf3:2181:/hyperbase1 --new.name=TestTableNew --families=cf1:cf2 TestTablestartrow : starting row key
stoprow : ending row key
starttime : start timestamp (ms since epoch)
endtime : end timestamp (optional)
new.name : name of the new table
peer.adr : target cluster Zookeeper address (host:port:znode)
families : list of column families to copy (e.g., cf1:cf2 copies cf1 to cf2)
Export
Export writes table data to HDFS as SequenceFiles, optionally filtered by timestamp.
sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]Example with a specific timestamp:
hbase org.apache.hadoop.hbase.mapreduce.Export member5 hdfs://master24:9000/user/hadoop/dump2 1 1401938590466 1401938590467Import
Import loads data previously exported by Export back into HBase.
sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>Example:
sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.Import member5 hdfs://master24:9000/user/hadoop/dump2ImportTsv
ImportTsv loads TSV‑formatted data into HBase. Two common uses:
Load data from HDFS TSV files via Put operations.
Prepare StoreFiles for bulk loading together with CompleteBulkload.
Example for direct load:
sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c <tablename> <hdfs-inputdir>Example for bulk load preparation:
sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c -Dimporttsv.bulk.output=<outputdir> <tablename> <hdfs-data-inputdir>Dimporttsv.columns : maps source columns to HBase columns; use HBASE_ROW_KEY for the row key.
Dimporttsv.bulk.output : directory for generated HFiles; if omitted, data is written directly to the table.
CompleteBulkload
CompleteBulkload moves generated StoreFiles into an HBase table, typically used after ImportTsv.
sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles <hdfs://storefileoutput> <tablename>The <hdfs://storefileoutput> path points to the StoreFiles produced by ImportTsv.
RowCounter and CellCounter
RowCounter is a MapReduce job that counts rows in a table, useful for verifying that all blocks are readable. It accepts --starttime and --endtime to limit the time range.
sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> [<column1> <column2> ...]CellCounter provides finer‑grained statistics, including row count, column‑family count, qualifier count, occurrence frequencies, and version totals. It also supports time‑range, regex, or prefix filters.
sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.CellCounter <tablename> <outputDir> [regex or prefix]hbase clean Tool
The hbase clean command removes HBase‑related data from ZooKeeper and/or HDFS, useful for testing or decommissioning clusters. sudo -u hbase hbase clean (--cleanZk|--cleanHdfs|--cleanAll) cleanZk : delete HBase data from ZooKeeper.
cleanHdfs : delete HBase data from HDFS.
cleanAll : delete data from both ZooKeeper and HDFS.
Additional Tools
Beyond the utilities covered above, HBase offers advanced tools such as DSTools for distributed storage maintenance, Bulkload for large‑scale data ingestion, and Yahoo's YCSB for performance benchmarking. Future articles will explore these in detail.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
StarRing Big Data Open Lab
Focused on big data technology research, exploring the Big Data era | [email protected]
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
