How Ctrip Scaled Its Ticket Booking System for Flash‑Sale Events

This article analyzes the challenges Ctrip faced when handling massive traffic during ticket flash‑sale events and details the architectural upgrades, caching strategies, database optimizations, supplier integration safeguards, and traffic‑control mechanisms that enabled stable, fast, and consistent booking experiences.

Architect
Architect
Architect
How Ctrip Scaled Its Ticket Booking System for Flash‑Sale Events

Background

In the post‑pandemic era the travel industry recovered rapidly, causing frequent high‑traffic flash‑sale promotions. Ctrip’s ticket reservation system must handle billions of requests while guaranteeing a smooth booking experience for domestic and international users.

Flash‑Sale Characteristics

Flash‑sale scenarios (e.g., Double‑11, 618, train‑ticket rushes, concert tickets) share three core traits: massive concurrent traffic, strict time sensitivity, and the need for strong consistency with multi‑dimensional purchase limits.

2020‑08‑08~2020‑09‑01: "HuiYou Hubei" event, traffic 45× normal (hundreds of thousands QPS).

2021‑09‑14: Beijing Universal Studios opening, highest sales among competitors.

2023‑09‑15: Wuhan Zoo opening, stable ordering despite supplier failures.

2024‑04‑10: IU global concert, tickets sold out in 10 seconds.

System Goals

Stability : uninterrupted service under peak load.

Accuracy : strong transactional consistency.

Speed : fluid booking experience with rapid confirmation.

Stability Challenges

Redis overload & cache hot‑key

Horizontal scaling alone cannot eliminate hotspot keys that concentrate CPU usage. The solution is a multi‑level cache with automatic hot‑key detection.

Hot‑key detection promotes keys accessed >10 times per second on a single node to a higher‑level cache or local memory, reducing Redis load and latency.

Large cache keys

Oversized keys cause memory pressure, network blockage, and slower queries.

Trim redundant fields.

Apply higher‑ratio compression.

Split large keys into smaller ones (evaluate I/O impact).

Establish a weekly scan to clean up big keys.

After optimization query latency dropped from ~300 µs to ~100 µs.

Database overload

Cache‑miss storms during flash sales create DB pressure. The original cache‑eviction listener deleted keys, leading to cache‑penetration and DB overload.

Cache‑cover update : update cache values directly instead of deleting them.

Message aggregation : batch rapid change events into a single update.

Asynchronous cache refresh : queue update tasks for background processing.

Supplier system instability

Supplier APIs may become slow or rate‑limited under load, jeopardizing order flow.

Peak‑shaving buffer pool : use a message queue to decouple order intake from supplier calls.

Automatic disable‑sale : monitor supplier health and temporarily ban affected suppliers.

Retry mechanism : periodically retry failed orders with adaptive intervals.

Traffic‑Control Strategy

Fine‑grained rate limiting per page and per product prevents a single hot item from overwhelming the system.

SOA‑level interface throttling.

Custom product‑level limits using sliding windows (e.g., 10 × 100 ms windows per second).

Automatic hotspot detection similar to Redis hot‑key logic.

Data Consistency

Accurate stock deduction is critical. Traditional relational DB row‑level locks become a bottleneck.

Solution: asynchronous stock deduction workflow.

Initialize: sync flash‑sale inventory to Redis.

Deduct in Redis at purchase time, then publish a message to asynchronously update the DB.

Return stock: on cancellation, reverse DB then Redis updates.

Eliminates row‑level lock contention and supports tens of thousands of orders per minute.

High Availability & Sustainability

Continuous architectural health governance and dedicated large‑event safeguard plans are essential.

Health metrics cover:

System runtime stability.

Architectural complexity (service count, dependency depth).

Engineering quality.

For major events and holidays, pre‑emptive stress testing and disaster‑recovery plans ensure the system remains operational under extreme load.

Conclusion

The ticket reservation system addresses flash‑sale challenges through multi‑level caching, cache‑cover updates, asynchronous stock handling, supplier‑side safeguards, and fine‑grained traffic control, while maintaining continuous health monitoring and high‑availability planning to sustain performance under massive concurrent traffic.

Code example

相关阅读:
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendDistributed SystemsSystem Architecturecachinghigh concurrencytraffic control
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.