Designing and Using Global Secondary Indexes in Apache Phoenix
This article explains how Apache Phoenix implements global secondary indexes using separate HBase tables, demonstrates index creation and data synchronization with example SQL, and provides design guidelines to optimize query latency and avoid full‑table scans in big‑data environments.
Overview
Global indexes are a key feature of Apache Phoenix; using secondary indexes wisely can reduce query latency and make better use of cluster resources.
Global Index Explanation
Global indexes store index data in a separate HBase table. The following example shows the relationship between index data and the main table data.
-- 创建数据表
CREATE TABLE DATA_TABLE(
A VARCHAR PRIMARY KEY,
B VARCHAR,
C INTEGER,
D INTEGER);
-- 创建索引
CREATE INDEX B_IDX ON DATA_TABLE(B) INCLUDE(C);
-- 插入数据
UPSERT INTO DATA_TABLE VALUES('A','B',1,2);When data is written to the main table, the index data is synchronized to the index table. The index table’s primary key combines the indexed columns with the main table’s primary key, and included columns are stored as regular columns, allowing queries to be satisfied by a single index lookup without accessing the main table.
Phoenix tables are HBase tables, and HBase row keys are stored in binary lexical order, meaning that a higher prefix match leads to rows being stored together.
Global Index Design
Continuing with the DATA_TABLE example, we create a composite index. Queries that match the index’s lexical ordering work best.
CREATE INDEX B_C_D_IDX ON DATA_TABLE(B,C,D); All field conditions using the = operator are shown below:
Note: The order of AND conditions in a query does not need to match the index column order.
In practice we recommend using 1‑4 columns following the prefix‑match principle to avoid full‑table scans; conditions 5‑7 would require scanning the entire table and are strongly discouraged.
Other Considerations
Order‑by or group‑by columns can still benefit from secondary indexes.
Design primary keys carefully to reduce the need for additional index tables, as more index tables increase write amplification.
When the ROW_TIMESTAMP feature is used, global indexes cannot be employed.
Applying salting to index tables can improve query and write performance and avoid hotspotting.
Feel free to like, bookmark, and share the article.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
