Evolution of JD Daojia Search System Architecture from Version 1.0 to 3.0
The article details the progressive architectural evolution of JD Daojia's search system—starting from a simple, single‑layer ES‑based 1.0 design, through the 2.0 overhaul that introduced full‑recall, independent ranking services, and index disaster‑recovery, to the 3.0 version that adds multi‑path recall, sophisticated ranking models, and automated routing for high availability.
JD Daojia, a leading instant‑retail platform, relies on a robust search system to help users find products quickly across multiple entry points such as home page, channel pages, flash‑sale pages, coupon pages, and the mini‑program.
Search System 1.0 was built as a simple, usable, layered monolith that used Elasticsearch (ES) terms and query_string queries on SKU names, with geographic filtering via Mercator projection. This version suffered from deep‑pagination performance degradation, incomplete recall, limited ranking factors, and a single‑point ES cluster dependency.
Search System 2.0 addressed these issues by abandoning ES pagination in favor of full‑recall with in‑memory pagination, abstracting the ranking module into an independent service, and enhancing index disaster‑recovery through cross‑data‑center ES deployment, regular snapshots, and dual‑write clusters. A group‑based store partitioning strategy (≈30 stores per group) balanced recall coverage and ES load.
The 2.0 ranking pipeline introduced a linear model that combined BM25 scores with business features (price, promotion, sales, etc.) and later an LR model for weight optimization.
Search System 3.0 further refined both strategy and architecture. It introduced fine‑grained multi‑path recall by extracting multi‑dimensional product components (name, brand, category, attributes, topics) offline and matching them with query components online. Queries were classified into L1 (precise intent), L2 (intent expansion), and L3 (supplementary text match) levels, each with tailored recall rules.
Store grouping was enhanced by considering static supply volume and industry type to avoid recall truncation. The architecture decoupled intent recognition, recall, and ranking into independent services within a search middle‑platform, achieving low coupling and high cohesion.
Ranking in 3.0 adopted a four‑stage pipeline—pre‑ranking, coarse ranking (XGB), fine ranking (TensorFlow deep model), and strategy re‑ranking—executed over six rank levels derived from the three‑level recall hierarchy.
A model‑center platform was introduced to expose LR, XGB, and TensorFlow models via a unified modelTag interface, enabling configuration‑driven feature management.
The routing platform was upgraded to a centralized, platform‑managed client‑server design that automatically performs circuit‑breaker‑style failover, supports multi‑cluster disaster recovery, and provides real‑time configuration updates without code changes.
The overall system now consists of five modules: search business, core search (including intent, recall, and business ranking), algorithm model prediction, index data (dual ES clusters), and auxiliary tools for debugging, strategy diff, bad‑case intervention, and dictionary management.
Looking ahead, JD Daojia plans a 4.0 version focusing on vector‑based recall, automated model iteration via an algorithm platform, and scenario‑specific recall strategies to further improve relevance and scalability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Dada Group Technology
Sharing insights and experiences from Dada Group's R&D department on product refinement and technology advancement, connecting with fellow geeks to exchange ideas and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
