Backend Development 15 min read

Reconstructing Ctrip's SEO Project: Architecture, Technical Choices, and Design Solutions

This article details the late-stage refactoring of Ctrip's SEO system, explaining why the overhaul was needed, the backend technology stack choices, the modular architecture—including data collection, processing, and service layers—and the performance optimizations implemented to support large‑scale search‑engine‑driven traffic.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Reconstructing Ctrip's SEO Project: Architecture, Technical Choices, and Design Solutions

Author Bio: Xiong Pin, Leader of the Public R&D team in Ctrip International Business Unit, focuses on internationalization components and market‑related projects, enjoys open‑source, and stays keen on new technologies.

What is an SEO project? Search Engine Optimization (SEO) improves a website’s natural ranking on search engines (mainly Google) and drives traffic to landing pages, which then convert visitors into orders. Ctrip’s SEO project builds landing pages for hotels and flights, handling large volumes of data.

Why refactor? The original system suffered from tight code coupling between front‑end and back‑end, shared databases with irrelevant fields, slow and manual data updates (2‑3 days per full refresh), and inability to meet massive link‑generation calculations (over 2,800 hours). Rapid requirement iteration, limited developers, and complex, distributed data sources further exacerbated these issues.

Technical selection (backend): The language was unified to Java (replacing mixed Java/PHP). MySQL became the primary datastore, replacing most Elasticsearch usage. RPC communication uses Baiji contracts rather than the newer CDubbo due to stability and team familiarity.

Design scheme: The system is divided into five main modules:

Vampire: Data collector that pulls incremental or full data from MQ, DB, or APIs, transforms it into a normalized format, and writes to the DB via Faba.

Faba: Provides asynchronous Write (via message queue) and Read interfaces; Write supports idempotency, batch deduplication, and high QPS; Read is optimized with vertical table splitting, indexing, and simple request/response contracts.

Service: Exposes business‑level APIs; each SEO page consists of several modules, each backed by a single Service interface, enabling reuse across languages, currencies, and cities.

Page: Front‑end representation of SEO landing pages (implementation handled by the front‑end team).

Portal: Configuration, logging, A/B testing, and statistics modules that control Service behavior, record update progress, compare configurations, and monitor cache hit rates.

Performance considerations: Vampire runs on four 8‑core VMs, achieving >10K ops/sec and processing 10 million records in ~30 minutes. Write throughput is limited by DB connection pool size (≈100) and I/O capacity; caching (local + Redis) is planned but low priority because direct DB access already meets QPS requirements.

Conclusion: Data quality, efficient collection, and simple, well‑designed interfaces are the core of the SEO project’s performance. The refactor simplifies code coupling, isolates data storage, and introduces scalable components to handle massive traffic and rapid business changes.

BackendJavaPerformance Optimizationdata pipelinemicroservicesseo
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.