Search Engineering Architecture: Lessons from Zhihu and 58 Group
The article summarizes the evolution and redesign of Zhihu's search engine, details 58 Group's high‑performance uesearch architecture, real‑time indexing mechanisms, cloud‑native deployment with Kubernetes, and highlights key technical insights and future directions for large‑scale search systems.
Background
On January 21, 2019, the 58 Group Technical Salon (Session 8 – Search Engineering Architecture) was held at the Beijing headquarters, featuring speakers from Zhihu's search team and 58 Group's TEG search team who shared their practical experiences.
1. Zhihu Search Architecture Evolution
1.1 First‑generation Search Zhihu built its own search in 2016, moving from Sogou to an Elasticsearch‑based system to meet growing demands for freshness, ranking quality, and content diversity.
1.2 Current State After a year‑long refactor in 2018, the system was rebuilt with clearer module boundaries, improving maintainability. A Rust‑based search engine compatible with Lucene replaced Elasticsearch, splitting the monolithic service into focused micro‑services, enhancing stability and performance.
1.3 Ongoing Improvements Future work focuses on further enhancing ranking quality and recall capabilities.
2. 58 Search Architecture
2.1 System Overview The self‑developed uesearch system serves various vertical search scenarios with high timeliness and consistency requirements. Its architecture consists of a stateless proxy layer, a merger layer for result merging and ranking, and a searcher layer that stores indexes and serves queries.
Horizontal sharding and replication enable unlimited scaling of data volume and concurrency.
2.2 Real‑time Index Update Design Real‑time indexing is achieved by building inverted indexes in memory within the search process. Updates are processed every few seconds, creating small index segments that are merged progressively (3 seconds → 15 minutes → 1 hour → permanent), ensuring fast document visibility and efficient search.
2.3 Cloud Search (云搜) Built on uesearch and Kubernetes, Cloud Search provides a private‑cloud search service where users define schemas and ingest documents. Kubernetes manages resources, schedules pods, and ensures automatic recovery of failed components.
3. Summary
The salon participants discussed search index organization, distributed index synchronization, query rewriting, multi‑replica consistency, relevance calculation, and search quality evaluation, sharing practical experiences with Elasticsearch, Lucene, and Kubernetes, and expressed a desire for continued collaboration to improve search system stability, timeliness, and relevance.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.