Inside Alibaba’s 10‑Year Search Engine: Architecture, Data Flow, and Indexing
Alibaba’s 10‑year‑old search engine combines data source aggregation, incremental and real‑time indexing, and online services through platforms like Tisplus, Bahamut, Maat, Ha3, Build Service and Drogo, illustrating a comprehensive architecture that powers 1688’s search capabilities across multiple engines and deployment pipelines.
1. Overall Architecture
The search engine consists of data source aggregation (often called dump), full/incremental/real‑time index construction, and online services. The pipeline flows from Tisplus (entry) through Bahamut (Maat workflow scheduling) → Blink → Hdfs/Swift → BuildService → Ha3 → SP → SW, delivering high‑availability and high‑performance search services.
2. Tisplus
Tisplus manages offline dump engines such as spu, cspu, company, buyoffer, and feed. It mainly handles HA3 and SP deployment and maintenance. The platform’s architecture is illustrated below.
Occasional data‑source failures arise from expired table permissions or ZK jitter. After adopting the Blink Batch model, dump execution time was significantly reduced (example shown for the buyoffer engine).
2.1 Bahamut – Data‑Source Graph Processing
Bahamut translates web‑assembled data graphs into executable SQL statements. Its components include:
Data input: datasource (supports TDDL and ODPS)
KV input: HbaseKV (HBase tables)
Data processing: Rename, DimTrans, Functions, Selector, UDTF, Merge, Join (left join)
Data output: Ha3 (HDFS/Swift)
The processing flow is illustrated below.
2.2 Maat – Distributed Workflow Scheduler
Maat is a re‑implementation of the open‑source Airflow, offering visual editing, generic node types, Drogo‑based deployment, cluster management, and comprehensive monitoring & alerting.
The Maat scheduling page for the eed engine is shown below.
2.3 Ha3 Doc – Data Output
After the above steps, data is output in XML (isearch format) to HDFS/Pangu (full) and Swift Topic (incremental). The HA3 table information is represented as JSON, for example:
{
"1649992010": [
{
"data": "hdfs://xxx/search4test_st3_7u/full",
"swift_start_timestamp": "1531271322",
"swift_topic": "bahamut_ha3_topic_search4test_st3_7u_1",
"swift_zk": "zfs://xxx/swift/swift_hippo_et2",
"table_name": "search4test_st3_7u",
"version": "20190920090800"
}
]
}3. Suez
In Suez, offline tables are configured with ZK type, specifying zk_server and zk_path. Build Service then creates full, incremental, or real‑time indexes, which are served by the HA3 online cluster.
Build Service roles include:
admin – controls the overall build flow, switches between full and incremental modes, and handles user requests.
processor – transforms raw documents into lightweight buildable documents.
builder – constructs the index.
merger – organizes the index.
rtBuilder – builds real‑time online indexes.
A complete full + incremental process generates a generation ID that passes through process‑full → builder‑full → merger‑full → process‑inc → builder‑inc → merger‑inc, with inc stages alternating between builder and merger.
3.2 Ha3 – Online Search Service
Ha3, built on the Suez framework, provides rich query, filter, sort, and aggregation clauses, and supports custom ranking plugins. Its service architecture consists of Qrs, searcher, and summary components.
The query flow is: Qrs parses the request, then seeks, filters, performs coarse ranking, aggregation, fine‑ranking (ReRank), final ranking (ExtraRank), merges results, and finally summary retrieves details.
ReRank and ExtraRank are implemented via Hobbit plugins and the “warhorse” (马) plugins, allowing business teams to define features and weights for final product ranking.
4. Drogo
Drogo is a control platform based on the two‑layer scheduling service Carbon, hosting 1688’s SP services and QP proxy services.
References include internal manuals and papers such as “Search Middle‑Platform Development and Operations Integration Practice – Sophon”, “DAG‑Based Distributed Task Scheduling Platform – Maat”, “Tisplus User Manual”, “Build Service User Manual”, “Ha3 User Manual”, “Drogo Platform Introduction”, and others.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
