How We Scaled a Company‑Wide Search Platform from 1.0 to 4.0

This article recounts the evolution of Youzan's internal search platform—from a simple Elasticsearch cluster in 2015 through multiple architectural revisions, advanced query middleware, big‑data integration, and a proxy‑based service layer—highlighting the technical challenges and solutions that enabled scalable, reliable search for billions of records.

Programmer DD
Programmer DD
Programmer DD
How We Scaled a Company‑Wide Search Platform from 1.0 to 4.0

Overview

Youzan Search Platform is an internal PaaS that supports search and multi‑dimensional filtering for over a hundred services and nearly ten billion records, handling product management, order retrieval, fan segmentation, and more.

Architecture 1.0

In 2015 the system consisted of a few high‑spec VMs running an Elasticsearch cluster that indexed products and fans. Data was synchronized from the DB to Elasticsearch via Canal. This simple setup allowed rapid index creation but tightly coupled sync programs to the business DB, causing issues during migrations and leading to performance degradation when multiple Canal instances subscribed to the same database.

Physical isolation was lacking; a large promotional event once exhausted the Elasticsearch heap, causing an OOM and bringing all indexes down.

Architecture 2.0

The 2.0 redesign introduced a data bus that publishes change events to a message queue (MQ). Sync applications consume MQ messages, decoupling them from the business DB and eliminating duplicate Canal listeners.

Advanced Search Middleware

To avoid pushing complex function_score queries to business developers, a middleware intercepts queries, rewrites them into advanced Elasticsearch DSL, and optionally caches results for 15–30 minutes, reducing repeated computation.

Big Data Integration

Using the open‑source es‑hadoop connector, a Hive‑Elasticsearch pipeline was built to feed search logs into HDFS via Flume, enabling offline scoring and suggestion generation.

Problems Identified

Rising maintenance cost due to tightly coupled sync code and message ordering issues.

Unpredictable traffic patterns caused occasional CPU saturation in the Elasticsearch cluster.

Architecture 3.0

Expose open APIs for data ingestion, fully decoupling business code.

Introduce a proxy layer that handles flow control, caching, and request validation.

Provide a management console for index changes and cluster administration.

Proxy Layer

The proxy offers a unified Elasticsearch interface (via ESLoader), request validation, result caching, and template‑based query generation. A local cache degrades gracefully under traffic spikes, and search templates simplify complex DSL creation.

Management Platform

Built on Django, the console visualizes index metadata, supports approval workflows for index changes, and offers a custom query UI that avoids heavy fielddata loads.

ESWriter

To gain fine‑grained control over offline write traffic, a DataX‑based ESWriter plugin was created, allowing per‑second throttling of record count or data volume.

Challenges

Even with platformization, shared Elasticsearch clusters still suffer from divergent production standards and limited horizontal scaling, as adding nodes requires lengthy provisioning cycles.

Future Architecture 4.0

The next step is to integrate with an internal Data Transport Service (DTS) for automated, configuration‑driven synchronization between databases and Elasticsearch, and to move toward cloud‑native, Kubernetes‑based Elasticsearch services with isolated physical clusters for core workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendarchitectureScalability
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.