Building Youzan's Enterprise Search Platform: Architecture, Indexing & Scaling
This article explores Youzan's enterprise search middle platform, detailing the challenges of siloed architectures, the concept of cognitive folding, comprehensive index design, write/read mechanisms, configuration-driven routing, monitoring, and practical implementations that enable scalable, reusable search capabilities across diverse business domains.
Problem Domain
The search middle platform addresses common challenges in siloed, self‑built search solutions:
Technology selection driven by trends rather than specific real‑time or accuracy requirements.
Inconsistent data across multiple sources, especially under concurrent updates.
Data silos that make cross‑system business integration difficult and slow.
Scalability questions such as when to split data, which dimension to split on, and whether each new search demand requires DDL changes.
Search Middle Platform = enterprise‑level reusable search capability platform
Exploration
2.1 Cognitive Folding
The platform aims to “fold” the entire search pipeline—index design, write, and read—into a single, reusable product so that business teams can focus on search requirements without dealing with implementation details.
2.2 Index Management
Index Design Choices
MySQL – high real‑time needs, composite indexes can cover queries.
HBase – forward (inverted) queries and large IN‑list queries.
KV stores – statistical interfaces with lower real‑time requirements; prefer KV reads.
Elasticsearch – fuzzy search and multi‑condition queries.
TiDB – composite index queries, massive archival data with automatic sharding.
Other storage options as needed.
Index Splitting Considerations
Determine incremental vs. full data volume and whether splitting is required.
Choose split dimension (time‑based or ID‑based).
Plan for hotspot/skew handling after split.
Scalability Design
Assess if each new demand forces DDL and data refresh.
Support user‑defined search requirements without code changes.
2.3 Index Write
Configurable Synchronization
The sync process is abstracted into three stages: input → filter → output.
Input : source of index data, typically messages (e.g., binlog or processed business events).
Filter : enrichment layer that can join business tables or call APIs to construct index records.
Output : configurable storage targets such as ES, MongoDB, MySQL, HBase, etc.
Incremental Write
Extension points decouple business logic from the platform, allowing businesses to remain unaware of sync details while the platform handles incremental updates.
Offline Write
Initial bulk load: use a CREATE operation; incremental data is filtered out.
Routine refresh: batch update with version increment + 1. If offline overwrite risk exists, set a large version step (e.g., 1000) based on business needs.
Consistency Guarantee
The platform provides flexible eventual consistency for synchronization.
2.4 Index Read
Configurable Routing (LOS Layer)
Search may target MySQL, ES, or other backends. To avoid scattered if‑else code and to manage diverse storage characteristics, a “League Of Search” (LOS) layer centralizes routing strategies.
Universal DSL
A unified DSL abstracts the differing SQL dialects of each storage engine, shielding business teams from backend‑specific syntax.
Search Process Orchestration
Different business scenarios require varying pipelines (coarse‑rank only, re‑rank, detail assembly, fine‑rank return). A workflow engine orchestrates these steps.
2.5 Universal Monitoring
Metrics identify top search scenarios, enabling product teams to evaluate value and prioritize optimizations.
Practice
3.1 Collaborative Flywheel
The platform adopts a “collaborative flywheel” concept: networked collaboration creates value, which in turn accelerates further collaboration. This aligns with the cognitive folding principle to continuously simplify and amplify search capabilities.
3.2 Domain Decoupling
Marketing‑Product Search Connectivity
Example: coupons need sorting by price, sales, and rating. Marketing stores activity‑product mappings; the platform fetches a large list of product IDs and performs sorting in the product domain, avoiding tight coupling between marketing and product services.
Universal Cross‑Domain Search
Use cases such as CPS product search, live‑streamer product selection, and commission‑based queries are handled by establishing data links during synchronization, enabling seamless cross‑domain search without business‑level changes.
3.3 Business‑Nurturing Middle Platform
Index Rebuild Productization
To address shard misconfiguration, oversized indexes, or cluster migrations, the platform provides a rapid rebuild capability:
Millions of indexes in seconds.
Tens of millions in minutes.
Hundreds of millions in hours.
Billions in half a day.
Data consistency is guaranteed after rebuild.
The rebuild project is named “spam” as a homage to the lunch‑meat folding analogy.
Rebuild Without Business Code Changes
For custom sync implementations, a configuration‑driven double‑write routing layer allows index rebuilding without touching business code.
VIP Index Configurable Migration
During high‑traffic events, VIP merchant traffic is routed to a dedicated cluster via configurable routing, then migrated back after the event, improving system stability.
3.4 Platform Empowering Business
The platform continuously gathers new pain points, such as scaling order search from single stores to chain‑wide management, or integrating marketing coupon distribution across stores. It also provides a universal, configuration‑driven solution for large‑scale data archiving search, integrating archival storage with searchable indexes.
Insights
The middle platform is an architectural mindset that requires coordination across teams, not a single technology.
Focusing on high‑leverage reuse capabilities yields disproportionate business value.
Avoid over‑design; build capabilities tightly aligned with business needs for rapid delivery.
Outlook
The search middle platform is less than a year old; many scenarios are still early, leaving ample room for further business empowerment and feature expansion.
Youzan Coder
Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
