Big Data 18 min read

Unified Search System Architecture and Automation for Multiple Business Scenarios

To avoid building separate search services for each Xianyu business, the team created a unified, generic search architecture based on Alibaba’s HA3 engine and a control layer that automates data dumping, indexing, query translation, and result ranking across five subsystems, enabling new services to be onboarded in minutes instead of weeks.

Xianyu Technology
Xianyu Technology
Xianyu Technology
Unified Search System Architecture and Automation for Multiple Business Scenarios

Background: Xianyu’s rapid growth creates a large variety of business data that requires search capabilities. Building separate search services for each business would be costly in development and maintenance.

The team built a single, generic search system that can serve heterogeneous data from many business scenarios.

System Overview: The solution is based on Alibaba’s HA3 search engine with an upper‑level control system (Tisplus2). It consists of five main subsystems:

1. Dump : Transforms DB data (full and incremental) into files or messages that BuildService can consume.

2. BuildService (BS) : Generates index files from dump output.

3. Search Business Gateway : Provides a unified service interface, hiding engine details from business callers.

4. Search Planner (SP) : Orchestrates query rewriting, category prediction, scoring, multi‑stage recall, de‑duplication, etc.

5. Online Engine Service : Includes QRS and Searcher nodes; QRS forwards queries to Searchers, aggregates and ranks results.

Key Technical Points:

Generic Search Reserved Tables : Pre‑defined tables (two dimensions, each with MySQL and ODPS tables) contain only type information, no business semantics. Field names follow a systematic pattern (e.g., dima_pk, dima_a_int).

Metadata Registration Center : A web UI where a new business registers its source DB, tables, and field mappings. The system assigns a unique business ID used for isolation throughout the pipeline.

Two‑Layer Dump : Business‑specific tables are merged and joined (M1, M2) to produce a wide table that matches the reserved‑table schema. The dump process is automated via Alibaba’s internal middleware “Jingwei”.

Online Query Service : Provides a translation layer so developers can use semantic APIs (e.g., param.setTitle("iPhone6S")) while the system converts them to the engine’s non‑semantic field names and back‑translates results.

Code Example – Semantic API usage:

param.setTitle("iPhone6S");
param.setSellerId(1234567L);
result = searchService.doSearch(param);

Generated Business‑Specific Parameter Class (excerpt):

public class UnisearchBiz1001SearchParam extends IdleUnisearchBaseSearchParams {
    private Set<Long> unisearch_includeCollection_prefix_poiCode;
    private Set<Long> unisearch_excludeCollection_prefix_poiCode;
    private String unisearch_keywords_poiName;
}

Generated Business‑Specific Result DO (excerpt):

public class UnisearchBiz1001SearchResultDo extends IdleUnisearchBaseSearchResultDo {
    private Long poiId;
    private Long poiCode;
    private String poiName;
}

Incremental Data Handling: To avoid real‑time overload during large initial loads, an extra gmtdropinctag column is added. During full‑dump, this tag is set to null, causing the data to be ignored by the real‑time pipeline. After the initial load, a full re‑dump rebuilds the index, and incremental tasks resume normal operation.

Results: The unified system now serves three Xianyu businesses, enabling new services to be onboarded within 10‑30 minutes, eliminating the previous week‑long dependency on a search owner.

Future Outlook: Continued automation to free developers from repetitive tasks and focus on innovative work.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datadata pipelineAutomationindexingmetadata registrationsearch engine
Xianyu Technology
Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.