Building Youzan's Enterprise Search Platform: Architecture, Indexing & Scaling

This article explores Youzan's enterprise search middle platform, detailing the challenges of siloed architectures, the concept of cognitive folding, comprehensive index design, write/read mechanisms, configuration-driven routing, monitoring, and practical implementations that enable scalable, reusable search capabilities across diverse business domains.

Youzan Coder
Youzan Coder
Youzan Coder
Building Youzan's Enterprise Search Platform: Architecture, Indexing & Scaling

Problem Domain

The search middle platform addresses common challenges in siloed, self‑built search solutions:

Technology selection driven by trends rather than specific real‑time or accuracy requirements.

Inconsistent data across multiple sources, especially under concurrent updates.

Data silos that make cross‑system business integration difficult and slow.

Scalability questions such as when to split data, which dimension to split on, and whether each new search demand requires DDL changes.

Search Middle Platform = enterprise‑level reusable search capability platform

Exploration

2.1 Cognitive Folding

The platform aims to “fold” the entire search pipeline—index design, write, and read—into a single, reusable product so that business teams can focus on search requirements without dealing with implementation details.

2.2 Index Management

Index Design Choices

MySQL – high real‑time needs, composite indexes can cover queries.

HBase – forward (inverted) queries and large IN‑list queries.

KV stores – statistical interfaces with lower real‑time requirements; prefer KV reads.

Elasticsearch – fuzzy search and multi‑condition queries.

TiDB – composite index queries, massive archival data with automatic sharding.

Other storage options as needed.

Index Splitting Considerations

Determine incremental vs. full data volume and whether splitting is required.

Choose split dimension (time‑based or ID‑based).

Plan for hotspot/skew handling after split.

Scalability Design

Assess if each new demand forces DDL and data refresh.

Support user‑defined search requirements without code changes.

2.3 Index Write

Configurable Synchronization

The sync process is abstracted into three stages: input → filter → output.

Input : source of index data, typically messages (e.g., binlog or processed business events).

Filter : enrichment layer that can join business tables or call APIs to construct index records.

Output : configurable storage targets such as ES, MongoDB, MySQL, HBase, etc.

Sync pipeline
Sync pipeline

Incremental Write

Extension points decouple business logic from the platform, allowing businesses to remain unaware of sync details while the platform handles incremental updates.

Incremental write flow
Incremental write flow

Offline Write

Initial bulk load: use a CREATE operation; incremental data is filtered out.

Routine refresh: batch update with version increment + 1. If offline overwrite risk exists, set a large version step (e.g., 1000) based on business needs.

Offline write versioning
Offline write versioning

Consistency Guarantee

The platform provides flexible eventual consistency for synchronization.

Consistency diagram
Consistency diagram

2.4 Index Read

Configurable Routing (LOS Layer)

Search may target MySQL, ES, or other backends. To avoid scattered if‑else code and to manage diverse storage characteristics, a “League Of Search” (LOS) layer centralizes routing strategies.

Routing configuration
Routing configuration

Universal DSL

A unified DSL abstracts the differing SQL dialects of each storage engine, shielding business teams from backend‑specific syntax.

Search Process Orchestration

Different business scenarios require varying pipelines (coarse‑rank only, re‑rank, detail assembly, fine‑rank return). A workflow engine orchestrates these steps.

Search orchestration
Search orchestration

2.5 Universal Monitoring

Metrics identify top search scenarios, enabling product teams to evaluate value and prioritize optimizations.

Monitoring dashboard
Monitoring dashboard

Practice

3.1 Collaborative Flywheel

The platform adopts a “collaborative flywheel” concept: networked collaboration creates value, which in turn accelerates further collaboration. This aligns with the cognitive folding principle to continuously simplify and amplify search capabilities.

Collaborative flywheel
Collaborative flywheel

3.2 Domain Decoupling

Marketing‑Product Search Connectivity

Example: coupons need sorting by price, sales, and rating. Marketing stores activity‑product mappings; the platform fetches a large list of product IDs and performs sorting in the product domain, avoiding tight coupling between marketing and product services.

Marketing‑product decoupling
Marketing‑product decoupling

Universal Cross‑Domain Search

Use cases such as CPS product search, live‑streamer product selection, and commission‑based queries are handled by establishing data links during synchronization, enabling seamless cross‑domain search without business‑level changes.

Cross‑domain search
Cross‑domain search

3.3 Business‑Nurturing Middle Platform

Index Rebuild Productization

To address shard misconfiguration, oversized indexes, or cluster migrations, the platform provides a rapid rebuild capability:

Millions of indexes in seconds.

Tens of millions in minutes.

Hundreds of millions in hours.

Billions in half a day.

Data consistency is guaranteed after rebuild.

The rebuild project is named “spam” as a homage to the lunch‑meat folding analogy.

Rebuild Without Business Code Changes

For custom sync implementations, a configuration‑driven double‑write routing layer allows index rebuilding without touching business code.

VIP Index Configurable Migration

During high‑traffic events, VIP merchant traffic is routed to a dedicated cluster via configurable routing, then migrated back after the event, improving system stability.

3.4 Platform Empowering Business

The platform continuously gathers new pain points, such as scaling order search from single stores to chain‑wide management, or integrating marketing coupon distribution across stores. It also provides a universal, configuration‑driven solution for large‑scale data archiving search, integrating archival storage with searchable indexes.

Insights

The middle platform is an architectural mindset that requires coordination across teams, not a single technology.

Focusing on high‑leverage reuse capabilities yields disproportionate business value.

Avoid over‑design; build capabilities tightly aligned with business needs for rapid delivery.

Outlook

The search middle platform is less than a year old; many scenarios are still early, leaving ample room for further business empowerment and feature expansion.

backend architectureIndexingSearchScalable Systemsenterprise searchconfiguration-driven
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.