Databases 6 min read

When to Use HBase and Basic HTable Concepts

This article explains the scenarios where HBase is appropriate, introduces core HTable concepts such as row keys, column families, columns, timestamps and values, and outlines design principles for schema and versioning to efficiently handle semi‑structured, sparse, multi‑version and massive data sets.

Architecture Digest

Sep 21, 2017

When to Use HBase and Basic HTable Concepts

HBase is suitable for semi‑structured or unstructured data where the schema is uncertain or highly variable, allowing dynamic addition of fields without downtime, unlike traditional RDBMS.

It efficiently stores very sparse records because null columns are not persisted, saving space and improving read performance.

HBase supports multiple versions of a value identified by row key, column key, and timestamp, making it convenient for storing historical changes while typically retrieving the latest version.

When data volume grows beyond the capacity of relational databases, HBase can scale horizontally by simply adding nodes, automatically handling region splitting and integrating with Hadoop for reliable storage (HDFS) and high‑performance analytics (MapReduce).

Key HTable concepts include:

Row key: the primary identifier; queries are limited to row key ranges or full scans, so careful design is crucial for performance.

Column Family: declared at table creation; each family is a storage unit.

Column: belongs to a column family and can be added dynamically; columns within a family are stored together and sorted by column key.

Timestamp: each cell can have multiple versions sorted by timestamp, with the current system time as the default.

Value: the actual data, uniquely addressed by table name, row key, column key, and timestamp.

The storage types are: TableName as string; RowKey and ColumnName as binary (byte[]); Timestamp as 64‑bit integer; Value as byte array.

Design principles for HBase schemas include minimizing the number of column families, carefully designing row keys to avoid monotonic patterns, keeping row keys and column families small, and limiting the number of stored versions to avoid excessive storage overhead.

Author: 飒然Hang, architect/backend engineer, working@中华万年历. Source: http://www.rowkey.me/blog/2015/06/10/hbase-about/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

NoSQL Column Family Row Key

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.