Databases 12 min read

Mastering HBase Ops: Essential Tools and Commands for Cluster Management

This guide introduces the most commonly used HBase operational tools—including Canary, hbck, HFile viewer, CopyTable, Export/Import, ImportTsv, CompleteBulkload, RowCounter, CellCounter, and clean utilities—explaining their purposes, typical use‑cases, and exact command syntax for effective cluster administration.

StarRing Big Data Open Lab
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Mastering HBase Ops: Essential Tools and Commands for Cluster Management

Canary

HBase Canary is a tool for checking the health of an HBase cluster at the column‑family, Region, or RegionServer level by fetching a row from each Region of a specified table and reporting failures or latency.

Check whether all Regions in the cluster are reachable:

sudo -u hbase hbase org.apache.hadoop.hbase.tool.Canary
sudo -u hbase hbase org.apache.hadoop.hbase.tool.Canary -t 60000

-t <N> sets the timeout in milliseconds (default 600000).

Check all Regions of specific tables:

sudo -u hbase hbase org.apache.hadoop.hbase.tool.Canary table_name1 table_name2 ...
sudo -u hbase hbase org.apache.hadoop.hbase.tool.Canary -t 60000 table_name1 table_name2 ...

Check the status of RegionServers:

sudo -u hbase hbase org.apache.hadoop.hbase.tool.Canary -regionserver
sudo -u hbase hbase org.apache.hadoop.hbase.tool.Canary -t 60000 -regionserver

hbck Tool

The hbck utility checks HBase cluster consistency. sudo -u hbase hbase hbck The command prints OK if the cluster is consistent, or INCONSISTENCY if problems are detected. Use the -details flag for more information. Re‑run if the inconsistency may be temporary (e.g., during startup or region split).

HFile Viewer

To view the textual representation of an HFile, use the HFile tool:

${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile

Example:

sudo -u hbase hbase org.apache.hadoop.hbase.io.hfile.HFile -v -f hdfs://10.81.47.41:8020/hbase/TEST/1418428042/DSMP/4759508618286845475

The -v flag shows detailed content; omit it for a summary.

CopyTable

CopyTable copies all or part of a table, either within the same cluster or across clusters.

sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.CopyTable --help

Usage example:

hbase org.apache.hadoop.hbase.mapreduce.CopyTable -Dhbase.client.scanner.caching=100 -Dmapred.map.tasks.speculative.execution=false --startrow=rk1 --stoprow=rk4 --starttime=1265875194289 --endtime=1265878794289 --peer.adr=transwarp-perf1,transwarp-perf2,transwarp-perf3:2181:/hyperbase1 --new.name=TestTableNew --families=cf1:cf2 TestTable

startrow : starting row key

stoprow : ending row key

starttime : start timestamp (ms since epoch)

endtime : end timestamp (optional)

new.name : name of the new table

peer.adr : target cluster Zookeeper address (host:port:znode)

families : list of column families to copy (e.g., cf1:cf2 copies cf1 to cf2)

Export

Export writes table data to HDFS as SequenceFiles, optionally filtered by timestamp.

sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]

Example with a specific timestamp:

hbase org.apache.hadoop.hbase.mapreduce.Export member5 hdfs://master24:9000/user/hadoop/dump2 1 1401938590466 1401938590467

Import

Import loads data previously exported by Export back into HBase.

sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>

Example:

sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.Import member5 hdfs://master24:9000/user/hadoop/dump2

ImportTsv

ImportTsv loads TSV‑formatted data into HBase. Two common uses:

Load data from HDFS TSV files via Put operations.

Prepare StoreFiles for bulk loading together with CompleteBulkload.

Example for direct load:

sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c <tablename> <hdfs-inputdir>

Example for bulk load preparation:

sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c -Dimporttsv.bulk.output=<outputdir> <tablename> <hdfs-data-inputdir>

Dimporttsv.columns : maps source columns to HBase columns; use HBASE_ROW_KEY for the row key.

Dimporttsv.bulk.output : directory for generated HFiles; if omitted, data is written directly to the table.

CompleteBulkload

CompleteBulkload moves generated StoreFiles into an HBase table, typically used after ImportTsv.

sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles <hdfs://storefileoutput> <tablename>

The <hdfs://storefileoutput> path points to the StoreFiles produced by ImportTsv.

RowCounter and CellCounter

RowCounter is a MapReduce job that counts rows in a table, useful for verifying that all blocks are readable. It accepts --starttime and --endtime to limit the time range.

sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> [<column1> <column2> ...]

CellCounter provides finer‑grained statistics, including row count, column‑family count, qualifier count, occurrence frequencies, and version totals. It also supports time‑range, regex, or prefix filters.

sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.CellCounter <tablename> <outputDir> [regex or prefix]

hbase clean Tool

The hbase clean command removes HBase‑related data from ZooKeeper and/or HDFS, useful for testing or decommissioning clusters. sudo -u hbase hbase clean (--cleanZk|--cleanHdfs|--cleanAll) cleanZk : delete HBase data from ZooKeeper.

cleanHdfs : delete HBase data from HDFS.

cleanAll : delete data from both ZooKeeper and HDFS.

Additional Tools

Beyond the utilities covered above, HBase offers advanced tools such as DSTools for distributed storage maintenance, Bulkload for large‑scale data ingestion, and Yahoo's YCSB for performance benchmarking. Future articles will explore these in detail.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DatadatabaseHBasetoolscommands
StarRing Big Data Open Lab
Written by

StarRing Big Data Open Lab

Focused on big data technology research, exploring the Big Data era | [email protected]

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.