Design and Architecture of 58 Cloud Search Platform Using Kubernetes and Docker
The article describes how 58's search technology team standardized and modularized their vertical search services into a cloud-native platform, detailing the overall architecture, Kubernetes‑based deployment, service and deployment designs, stateful indexing challenges, and solutions for high‑availability and scalability.
Introduction
58 is the largest classified information service platform in China, offering real‑estate, recruitment, used‑car, and directory services. The 58 Search Technology Department not only develops the main site search but also builds vertical search services for various business lines. Rapid growth in data volume, request traffic, and personalization demands exposed limitations of isolated vertical search services, including high development cost, resource contention, and slow iteration.
To address these issues, the department standardized, abstracted, and modularized existing capabilities, and built the 58 Cloud Search (云搜) system on top of their self‑developed search kernel and framework.
Overall Architecture of Cloud Search
The platform consists of four major parts: the search system, management platform, container cloud (Docker + Kubernetes), and auxiliary systems (image registry, configuration center, logging, monitoring, and alerting).
The search system includes three modules—index and query proxies, the search kernel, and a summary service—that handle the core search logic. The management platform provides a portal for users to configure and operate search instances. The container cloud hosts all search instances, while auxiliary systems support operational needs.
query : receives front‑end queries, performs intent recognition and query rewriting, forwards requests to the merger, and optionally fetches summaries.
index : accepts document write requests, stores documents in a queue and summary store.
merger : distributes requests to multiple searchers using consistent hashing, merges results, and performs final ranking adjustments.
searcher : holds index data and ranking models, performs real‑time indexing, recall, scoring, and ranking.
builder : pulls documents from the queue, builds them, and forwards them to searchers.
Management Platform : entry point for users to create and customize search instances and for administrators to perform operations.
Container Cloud (Docker + Kubernetes) : containerizes each search module and deploys them on a Kubernetes cluster.
Search in the Cloud
The cloud search solution combines the self‑developed search kernel/framework with Kubernetes and Docker to achieve cost‑effective cloudification. All modules are containerized and orchestrated via Kubernetes.
Service Layer
Each search module is exposed as a Kubernetes Service. Since Kubernetes lacks a native shard concept, multiple Services are used to represent different shards of a searcher, enabling sharding at the Service level.
Deployment/Pod Layer
Deployments describe Pods that actually run the traffic. Searcher Pods follow a side‑car pattern with three containers:
init container : checks for existing replicas and copies data if needed; performs full rebuild if no replica exists.
builder container : pulls incremental data from the document queue; document processing logic is isolated in a separate service, allowing independent scaling.
searcher container : handles indexing of real‑time data and query processing, returning results to the merger.
Merger Pods also use a side‑car pattern with init, watcher, and merger containers to coordinate configuration, monitor services, and forward queries.
Index Data Integrity
Stateful services like search present challenges for Kubernetes scheduling, such as replica synchronization, full‑ and incremental‑index pipelines, and seamless scaling. The platform solves these by:
Detecting whether a Pod is the first replica (full index build) or a subsequent one (copy existing index).
Copying index data from a running replica while handling ongoing incremental updates.
Selecting the most recent, running replica for copy, skipping non‑running or already‑busy replicas.
Limiting bandwidth during index copy and compressing data to reduce impact on live traffic.
These mechanisms also apply to pod restarts, replica migrations, and scaling scenarios, providing fully automated handling.
Conclusion
The article outlines the cloud search architecture and how the self‑developed search kernel and framework are integrated with Kubernetes + Docker to achieve a cloud‑native search platform. Since its launch, the platform has run stably for over a year and a half, serving hundreds of search instances across the group, and will continue to evolve in functionality, performance, stability, and openness.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
