Databases 5 min read

How HBase Stores Data: From Relational Tables to Column Families

This article compares traditional relational table design with HBase's column‑family model, showing how tables, rows, column families, and versioned cells are defined and how data is inserted using HBase's put commands.

Java High-Performance Architecture

Oct 13, 2016

How HBase Stores Data: From Relational Tables to Column Families

HBase is a NoSQL database designed for massive data, capable of handling tables with billions of rows and millions of columns. The article first reviews relational database table structures to provide context, then explains HBase's storage model.

Relational Database Table Structure

To understand HBase, we first recall how relational databases define tables. For example, a user table user_info with fields id, name, and tel is created with:

create table user_info (
    id type,
    name type,
    tel type
)

Data is inserted with a single statement such as: insert into user_info values(...) The resulting table is a two‑dimensional grid of rows and columns. Adding a new field (e.g., address) requires altering the table schema.

Table name and fields must be specified when creating the table.

Inserting records requires specifying values for each column.

The table is a two‑dimensional structure of rows and columns.

Adding fields is not flexible.

HBase Table Structure

When creating an HBase table, you specify the table name and one or more column families. For example: create 'user_info', 'base_info', 'ext_info' This creates a table user_info with two column families: base_info and ext_info. A column family is a collection of columns.

Column family is a group of columns.

The HBase table layout consists of a row key and the defined column families.

row key is the identifier for each row; it is automatically generated and does not need to be defined during table creation.

Inserting a user record (name = 'a', tel = '123') is done with two put commands:

put 'user_info', 'row1', 'base_info:name', 'a'
put 'user_info', 'row1', 'base_info:tel', '123'

Here row1 is the row key, base_info is the column family, and name and tel are columns within that family.

Another record (name = 'b', addr = 'beijing') is inserted as:

put 'user_info', 'row2', 'base_info:name', 'b'
put 'user_info', 'row2', 'ext_info:addr', 'bj'

HBase also supports versioning: each cell can have multiple versions identified by timestamps, allowing retrieval of historical values.

Summary

Like relational databases, HBase uses rows and columns.

When creating a table, you define the table name and column families, not individual columns.

Column families can contain any number of columns, and column names do not need to be predefined; different rows may have different columns within the same family.

Data is located using row key, column family name, column name, and optionally a version number.

Insertion is performed one column at a time, unlike relational databases that insert an entire row in a single statement.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database data modeling HBase NoSQL Column Family

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.