Big Data 12 min read

Ctrip’s Scalable Real‑Time User Behavior System with Kafka, Storm, Redis

This article details Ctrip’s redesign of its real‑time user behavior service, covering the new architecture, data flow, use of Java, Kafka, Storm, Redis, and MySQL, and how it achieves high real‑time performance, availability, scalability, and fault‑tolerance to support massive travel‑industry traffic.

21CTO
21CTO
21CTO
Ctrip’s Scalable Real‑Time User Behavior System with Kafka, Storm, Redis

1. Architecture

Ctrip’s real‑time user behavior service is a foundational service used in scenarios such as recommendation, dynamic ads, user profiling, and browsing history. To meet growing data volume and stricter real‑time, stability, and performance requirements, the system was redesigned with a two‑stream architecture: a processing stream and an output stream.

In the processing stream, client logs (App/Online/H5) are uploaded to a Collector Service, which forwards messages to a distributed queue. A stream‑processing framework (Storm) reads from the queue, processes the data, and writes it to a data layer composed of distributed cache (Redis) and a MySQL cluster.

The output stream is simpler: a Web Service pulls data from the data layer and serves it to callers, either internal services (e.g., recommendation) or front‑end components (e.g., browsing history). The technology stack includes Java, Kafka, Storm, Redis, MySQL, Tomcat, and Spring.

Java – mature big‑data components and strong internal adoption.

Kafka/Storm – proven distributed messaging and stream‑processing with solid operational support.

Redis – HA, SortedSet, and expiration features meet system needs.

MySQL – stable and performant at billion‑row scale, with horizontal scalability.

The system processes roughly 2 billion events per day with a latency of about 300 ms from ingestion to availability, and handles around 80 million query requests daily with an average latency of 6 ms.

2. Real‑Time Performance

Storm addresses traffic spikes by scaling out workers; it processes messages at millisecond granularity and supports at‑least‑once delivery with idempotent handling. A dual‑queue design separates fresh data (Queue1) from exception data (Queue2), allowing normal processing to continue while retries are managed independently.

Compensation strategies include retry workers that consume from Queue2, applying back‑off policies, and mechanisms to reset consumer cursors for backlog clearance.

3. Availability

The system eliminates single points of failure through full‑stack clustering: Kafka and Storm are natively clustered, Web services are stateless behind load balancers, and Redis/MySQL use primary‑backup deployments. When components become unavailable, graceful degradation routes data to Redis or MySQL fallback paths, and a circuit‑breaker (Hystrix) prevents cascading failures.

Rate limiting is applied at multiple levels (AppId/IP, service, interface) to protect the system from abusive calls.

4. Performance & Scalability

To support ten‑fold capacity growth, the data layer employs consistent‑hashing for Redis expansion and horizontal sharding for MySQL. Sharding follows powers‑of‑two for easier scaling, and MySQL master‑slave promotion is automated via DNS switching, reducing migration time to minutes.

5. Deployment

Storm deployments are straightforward: uploading a new JAR and restarting the topology. Kafka offsets allow the system to resume processing from the correct position after restarts. For versioned behavior records, a backup job runs older versions alongside the current one.

Real‑time user behavior system logical view
Real‑time user behavior system logical view
Storm features
Storm features
Storm architecture
Storm architecture
Dual‑queue design
Dual‑queue design
Compensation retry strategy
Compensation retry strategy
Backlog data resolution
Backlog data resolution
Normal data flow
Normal data flow
DB degradation
DB degradation
Circuit breaker pattern
Circuit breaker pattern
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Real-TimeredisSystem DesignStreamingKafkamysqlStorm
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.