Backend Development 14 min read

Optimizing Ctrip’s Vacation Search Engine: From Search 1.0 to 5.5

This article details the evolution and optimization of Ctrip’s vacation search engine, covering business challenges, indexing redesign, data collection pipelines, write‑path improvements, compression techniques, query performance enhancements, deployment strategies, and the resulting gains in storage, latency, and stability.

Ctrip Technology

Jun 29, 2020

Optimizing Ctrip’s Vacation Search Engine: From Search 1.0 to 5.5

Background : Ctrip’s vacation search engine is a vertical O2O search platform for travel products such as group tours, free‑travel, cruises, and study tours. The article shares the optimization process to inspire similar projects.

Business characteristics and difficulties : The product catalog contains thousands of SKUs with frequent status changes, leading to high index update frequency and complex query scenarios. User experience and query efficiency are critical.

Terminology : Definitions of departure city, optimal departure city, and the Data Acquisition System (DAS) with its full, incremental, and batch collection modes.

Search 1.0 : Simple redundant index per departure station (63 stations, ~100k records). Advantages: fast retrieval and simple state determination. Disadvantages: data redundancy across stations.

Search 2.0 : Expansion to >2,000 departure cities, diversified data sources (DB, Hive, HBase, Redis, MQ). Problems addressed: index explosion, data growth, minute‑level freshness.

Problems in Search 2.0 : (1) How to index 2K+ cities; (2) Rapid data growth; (3) Higher freshness requirements.

Design and thinking for Search 2.0 : Explored space‑time trade‑offs, parent‑child documents, and finally adopted a wide‑index storing city‑related numeric fields in a single column.

Data collection and write : Direct write to index with backup in HBase; later decoupled collection and indexing using message queues to smooth write spikes, aggregate messages, and prioritize important updates.

Search 3.0 : Continued data growth (8 × 10⁶ schedule records). Implemented several optimizations:

Index write optimization : Switched from 2‑hour full writes to daily full compensation plus 5‑minute incremental updates; used Spark for parallel processing.

Message processing optimization : Separated city up/down status from price change messages, aggregated price changes, reducing ES update pressure.

Buffered write : Buffered changes in Redis, merged duplicates, and performed batch writes every 5 minutes, cutting price‑related write volume to one‑third.

Index structure optimization : Compressed schedule data into a 64‑bit long (31 bits for days, 4 for month, 5 for year, 22 for city), reducing storage to 1/31 of original; applied similar compression to scores and price data, shrinking total size from >100 GB to <10 GB.

Field compression : Replaced massive map fields with a hybrid map+array approach, reducing field count from >7 k to ~130 and improving query speed.

Query performance optimization : Replaced linear scans with hash maps for optimal city lookup and transformed POI score lookup into a binary‑tree‑like structure, cutting query time roughly in half.

Deployment and flow control : Multi‑cluster deployment with GSLB, proximity routing, static routing, and traffic shaping to handle peak loads.

Optimization results : Index size reduced to 7 % of original, full schedule update time cut from 4 h to 1 h, incremental update interval reduced from 2 h to 5 min (60 % less data), query latency dropped from ~120 ms to ~40 ms, and overall system stability improved.

Conclusion : A well‑designed index dramatically improves performance and stability of a high‑traffic search service; algorithmic complexity awareness is essential as data scales.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend data pipeline Scalability Index Optimization

Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.