Databases 12 min read

Combining HBase and Elasticsearch: Challenges and the Lindorm Searchindex Solution

This article examines the complementary strengths of HBase and Elasticsearch, outlines three integration patterns and their associated challenges, and introduces Alibaba Cloud's Lindorm Searchindex as a SQL‑driven, low‑cost solution that simplifies storage and full‑text search for massive data workloads.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Combining HBase and Elasticsearch: Challenges and the Lindorm Searchindex Solution

HBase and Elasticsearch are frequently used in modern applications that handle massive data. HBase is a distributed key‑value store with flexible schema, horizontal scalability, low cost, and high concurrency, but it lacks strong complex query and analytical capabilities. Elasticsearch is a distributed search engine with flexible schema, horizontal scalability, and fast retrieval, yet it has higher cost, limited concurrency, and weaker consistency.

Because both systems share flexible data structures and distributed extensibility, they are often combined: Elasticsearch serves as an index for selected HBase fields, achieving low‑cost storage, high‑throughput writes, and efficient search. Typical scenarios include logs, monitoring, billing, and user profiling.

HBase and Elasticsearch Combined Usage

When an application decides to use HBase + ES together, it must solve two core problems: accurate data writing to both systems and combined querying. Three common approaches are:

1) Dual write and dual read: the application interacts with HBase and Elasticsearch independently. This requires no extra dependencies but incurs high development cost, increased latency, reduced availability, and consistency challenges.

2) Automatic data replication with dual read: the write path only talks to HBase, while the read path queries both systems. This keeps writes transparent to the application and eases eventual consistency, but requires an additional data‑sync service and still leaves query complexity.

3) Trigger‑based approach using HBase Coprocessor: the application reads and writes only to HBase. Write triggers automatically forward data to Elasticsearch, while read triggers parse Scan statements, rewrite them to leverage Elasticsearch indexes, merge results with the full HBase row, and return the combined data. This simplifies application logic but demands deep knowledge of HBase Coprocessor, complex development, and still faces consistency, availability, and latency issues.

The above approaches reveal several pain points:

1) Massive development and maintenance effort for real‑time sync, query merging, index management, and historical index building.

2) Weak data consistency due to asynchronous replication; data may be visible in HBase but not yet in Elasticsearch.

3) High deployment cost because HBase (storage‑compute separated) and Elasticsearch (storage‑compute coupled) cannot share resources.

4) Reduced availability and throughput; the slower Elasticsearch becomes the bottleneck for writes.

5) Feature gaps: TTL, multi‑version, and other native HBase capabilities are not fully supported in Elasticsearch.

6) Non‑Java developers face difficulty because the sync components and coprocessors are Java‑centric.

Lindorm Searchindex Introduction

Beyond HBase + Elasticsearch, many combinations such as MySQL‑Elasticsearch, MongoDB‑Elasticsearch, or Cassandra‑Elasticsearch suffer similar complexities. To address this, Alibaba Cloud's Lindorm offers an enterprise‑grade feature called Searchindex, enabling simple, efficient, low‑cost storage‑search for massive data.

Lindorm is a cloud‑native multi‑model database supporting wide‑table, time‑series, search, and file models, compatible with HBase/Cassandra, OpenTSDB, Solr, SQL, HDFS, and more. It serves use cases like IoT, advertising, social media, monitoring, gaming, and risk control.

Searchindex is a new type of index on Lindorm wide‑tables. Users manage index creation, deletion, and building via simple SQL statements, and data reads/writes also use unified SQL, offering the experience of traditional secondary indexes while providing powerful full‑text and complex condition queries powered by a Lucene‑based distributed search engine (LindormSearch) using inverted indexes, BKD‑Tree, Bitmap, etc.

Example usage:

Original table definition:

CREATE TABLE myTable (
    id BIGINT,
    name TEXT,
    age INT,
    sex TEXT,
    city TEXT,
    address TEXT,
    PRIMARY KEY (id)
);

Create a full‑text index on name, age, sex, city, address:

CREATE SEARCH INDEX myIndex ON myTable WITH COLUMNS (name, age, sex, city, address);

Standard queries:

-- Fuzzy query
SELECT * FROM myTable WHERE name LIKE '小%';

-- Multi‑dimensional query with sorting
SELECT * FROM myTable WHERE city='杭州' AND age>=18 ORDER BY age ASC;

-- Pagination
SELECT * FROM myTable WHERE name='小刘' AND sex=false OFFSET 100 LIMIT 10 ORDER BY age DESC;

Advanced queries:

-- Multi‑dimensional query using search_query syntax
SELECT * FROM myTable WHERE search_query='+city:杭州 +age:[18 TO *]' ORDER BY age ASC;

-- Text search
SELECT * FROM myTable WHERE search_query='address:西湖区';

From the example, users only need basic SQL knowledge—no additional development—to leverage Lindorm Searchindex.

Key advantages of Searchindex:

1) Simple and ready‑to‑use; index management and data optimization are controlled entirely by SQL, eliminating extra development.

2) Unified SQL access; the server automatically selects the optimal index to accelerate queries.

3) Strong consistency; unlike Elasticsearch, Lindorm Searchindex offers immediate visibility after writes, supporting both strong and eventual consistency modes.

4) Low cost; raw table data and index data share storage, reducing resource fragmentation.

5) Full feature set; TTL, multi‑version, and other core database functions remain functional within Searchindex.

6) Multi‑language support; applications can use Java, C++, Python, Go, and other mainstream languages.

Conclusion

For massive data workloads requiring low‑cost storage and efficient retrieval, the industry often adopts HBase + Elasticsearch, but this combination suffers from high development and maintenance complexity, weak consistency, high deployment cost, and feature loss. Similar issues appear with other database‑search engine pairings. Alibaba Cloud's Lindorm Searchindex addresses these challenges, offering a simple, high‑performance, and cost‑effective solution for large‑scale storage and search.

Big DataElasticsearchHBaseDatabase IntegrationLindormSearchindex
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.