Databases 11 min read

Combining HBase and Elasticsearch: Challenges and the Lindorm Searchindex Solution

The article examines the strengths and weaknesses of combining HBase and Elasticsearch for massive data storage and retrieval, outlines three integration patterns and their challenges, and presents Alibaba Cloud's Lindorm Searchindex as a SQL‑driven, low‑cost, strongly consistent solution that simplifies development and improves performance.

DataFunTalk
DataFunTalk
DataFunTalk
Combining HBase and Elasticsearch: Challenges and the Lindorm Searchindex Solution

HBase and Elasticsearch are two widely used technologies for handling massive data. HBase is a distributed key‑value store with flexible schema, horizontal scalability, low cost, and high concurrency, but it lacks strong complex query and analytical capabilities. Elasticsearch is a distributed search engine with flexible schema, fast retrieval, and horizontal scaling, yet it has limitations in cost, query concurrency, and consistency.

Because both systems share flexible data structures and distributed extensibility, they are often combined: HBase provides low‑cost storage and high‑throughput writes, while Elasticsearch offers efficient indexing and full‑text search. Typical use cases include logs, monitoring, billing, and user profiling.

When an application decides to use HBase + Elasticsearch together, the core problems are how to write data accurately to both systems and how to query and merge results. Three common approaches are:

Dual write and dual read by the application: the app interacts with HBase and Elasticsearch independently. This gives full control but increases development cost, maintenance complexity, write latency, availability risks, and consistency challenges.

Automatic data replication with dual read: the app writes only to HBase, while reads are performed on both HBase and Elasticsearch. This hides replication from the app and eases consistency, but requires an additional data‑sync service and still leaves query complexity.

Using HBase coprocessor triggers: the app reads/writes only HBase, while coprocessor triggers automatically write to Elasticsearch on writes and accelerate reads by translating scan queries to Elasticsearch searches. This simplifies the app logic but demands deep knowledge of HBase internals, complex trigger development, and still faces latency, consistency, and availability issues.

The above solutions reveal several pain points: massive development and maintenance effort, weak data consistency due to asynchronous replication, high deployment cost because both systems have separate resource pools, reduced availability and throughput when writes are serialized, loss of native features such as TTL and multi‑version support, and difficulty for non‑Java developers.

To address these issues, Alibaba Cloud's Lindorm introduces the enterprise‑grade Searchindex feature. Lindorm is a cloud‑native multi‑model database that supports wide‑table, time‑series, search, and file models, compatible with HBase/Cassandra, OpenTSDB, Solr, SQL, and HDFS interfaces, and is used in many large‑scale scenarios.

Searchindex is a new type of index for Lindorm wide‑tables. Users can manage index creation, deletion, and building with simple SQL statements, and data reads/writes are also performed via unified SQL, offering the same experience as traditional secondary indexes while providing powerful full‑text and complex condition query capabilities backed by a Lucene‑based distributed search engine (LindormSearch) using inverted indexes, BKD‑Tree, and bitmap structures.

Example usage:

Original table definition: CREATE TABLE myTable ( id bigint, name text, age int, sex text, city text, address text, PRIMARY KEY (id) );

Create a full‑text search index on several columns: CREATE SEARCH INDEX myIndex ON myTable WITH COLUMNS (name, age, sex, city, address);

Standard queries: SELECT * FROM myTable WHERE name LIKE '小%'; SELECT * FROM myTable WHERE city='杭州' AND age>=18 ORDER BY age ASC; SELECT * FROM myTable WHERE name='小刘' AND sex=false OFFSET 100 LIMIT 10 ORDER BY age DESC;

Advanced queries using the search_query syntax: SELECT * FROM myTable WHERE search_query='+city:杭州 +age:[18 TO *]' ORDER BY age ASC; SELECT * FROM myTable WHERE search_query='address:西湖区';

From the example it is clear that users only need basic SQL knowledge to leverage Lindorm Searchindex without any additional development.

Key advantages of Searchindex include:

Simple and ready‑to‑use: index management and construction are controlled entirely by SQL, eliminating extra development and maintenance.

Unified SQL access: both reads and writes use SQL, and the server automatically selects the optimal index to accelerate queries.

Strong consistency: unlike Elasticsearch, data written to Lindorm Searchindex becomes immediately visible, offering both strong and eventual consistency modes.

Low cost: original table data and index data share storage, reducing resource fragmentation.

Full feature support: TTL, multi‑version, and other core database features continue to work with Searchindex.

Multi‑language support: Java, C++, Python, Go, and other mainstream languages can access the feature.

In summary, while the industry often adopts HBase + Elasticsearch for low‑cost storage and high‑performance retrieval of massive data, this combination suffers from complex development, weak consistency, high deployment cost, and feature loss. Lindorm Searchindex provides a simple, efficient, and cost‑effective solution that resolves these challenges and meets the demands of large‑scale data storage and search.

Thank you for reading; you are welcome to try Searchindex and join the technical discussion group (DingTalk 35977898).

Big DataElasticsearchHBaseDatabase IntegrationLindormSearchindex
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.