Design and Implementation of Hujiang Search Service Using ElasticSearch

This article describes the background, architecture, external interfaces, full‑update strategy, seamless cluster scaling, and deployment optimizations of Hujiang's search service built on ElasticSearch, highlighting solutions for query granularity, metric evaluation, maintainability, and data consistency during index rebuilds.

Hujiang Technology
Hujiang Technology
Hujiang Technology
Design and Implementation of Hujiang Search Service Using ElasticSearch

Background

With Hujiang's rapid business growth and explosive data increase, each product line requires search capabilities, but the existing search system cannot meet expectations due to three main issues: lack of sentence‑level search support, no search‑related metric evaluation, and poor scalability and maintainability.

After researching industry solutions, ElasticSearch was chosen as the underlying index store, and the search service was redesigned to satisfy maintainability and customizable ranking requirements.

Overall Technical Architecture

The search service is built on the distributed search engine ElasticSearch, an open‑source, Lucene‑based, RESTful engine that provides near‑real‑time, stable, reliable, and fast search.

The service consists of five subsystems:

Search Server – provides search and query functions.

Index Server – handles incremental and full updates.

Admin Console – UI for index maintenance operations.

ElasticSearch Storage – underlying index data store.

Monitoring Platform – uses ELK logs and Zabbix for monitoring.

External System Interface Design

Query Interface – HTTP calls are recommended for cross‑datacenter access; otherwise Dubbo RPC can be used.

Incremental Update Interface – Business systems push data to a designated MQ channel; the update service listens and writes changes to ElasticSearch.

Full‑Index Interface – The update service invokes a business‑provided HTTP endpoint that supports pagination to retrieve all data.

Full Update

Full updates are essential for handling data loss, schema changes, and cold‑start bulk imports. During a full rebuild, incremental updates may still run, causing occasional data loss. The article illustrates a scenario where data between timestamps T1 and T2 is missed.

To solve this, a Zookeeper distributed lock is used to pause the index consumer while the new index is built:

Create a new index.

Obtain the alias of the target index and set the lock state to stop.

The index consumer watches the lock and pauses updates.

After the new index is populated, set the lock state to start.

The consumer resumes updating the new index.

Cluster Seamless Scaling

Due to explosive data growth, the ES cluster reached capacity limits. Leveraging ElasticSearch's online seamless scaling, the cluster was expanded from three to five nodes. The steps included preparing two new nodes with identical configurations, starting them one by one, verifying cluster discovery, and finally restarting the master node after updating configurations.

Deployment Optimization

Separate query and update services to isolate instability.

Reserve at least half of the JVM heap for Lucene; do not allocate more than 32 GB to the JVM heap.

Avoid wildcard queries as they behave like leading/trailing % in SQL.

Adjust index.refresh_interval from the default 1 s based on business latency tolerance.

Conclusion

The article presented Hujiang's search service architecture, discussed full‑update data consistency challenges, described online ES scaling, and listed several deployment optimizations. The goal is to share practical advice for building a generic search solution, with future posts covering ranking and scoring.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

search service
Hujiang Technology
Written by

Hujiang Technology

We focus on the real-world challenges developers face, delivering authentic, practical content and a direct platform for technical networking among developers.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.