Big Data 12 min read

Ctrip’s Scalable Real‑Time User Behavior System with Kafka, Storm, Redis

This article details Ctrip’s redesign of its real‑time user behavior service, covering the new architecture, data flow, use of Java, Kafka, Storm, Redis, and MySQL, and how it achieves high real‑time performance, availability, scalability, and fault‑tolerance to support massive travel‑industry traffic.

21CTO

Jul 8, 2017

Ctrip’s Scalable Real‑Time User Behavior System with Kafka, Storm, Redis

1. Architecture

Ctrip’s real‑time user behavior service is a foundational service used in scenarios such as recommendation, dynamic ads, user profiling, and browsing history. To meet growing data volume and stricter real‑time, stability, and performance requirements, the system was redesigned with a two‑stream architecture: a processing stream and an output stream.

In the processing stream, client logs (App/Online/H5) are uploaded to a Collector Service, which forwards messages to a distributed queue. A stream‑processing framework (Storm) reads from the queue, processes the data, and writes it to a data layer composed of distributed cache (Redis) and a MySQL cluster.

The output stream is simpler: a Web Service pulls data from the data layer and serves it to callers, either internal services (e.g., recommendation) or front‑end components (e.g., browsing history). The technology stack includes Java, Kafka, Storm, Redis, MySQL, Tomcat, and Spring.

Java – mature big‑data components and strong internal adoption.

Kafka/Storm – proven distributed messaging and stream‑processing with solid operational support.

Redis – HA, SortedSet, and expiration features meet system needs.

MySQL – stable and performant at billion‑row scale, with horizontal scalability.

The system processes roughly 2 billion events per day with a latency of about 300 ms from ingestion to availability, and handles around 80 million query requests daily with an average latency of 6 ms.

2. Real‑Time Performance

Storm addresses traffic spikes by scaling out workers; it processes messages at millisecond granularity and supports at‑least‑once delivery with idempotent handling. A dual‑queue design separates fresh data (Queue1) from exception data (Queue2), allowing normal processing to continue while retries are managed independently.

Compensation strategies include retry workers that consume from Queue2, applying back‑off policies, and mechanisms to reset consumer cursors for backlog clearance.

3. Availability

The system eliminates single points of failure through full‑stack clustering: Kafka and Storm are natively clustered, Web services are stateless behind load balancers, and Redis/MySQL use primary‑backup deployments. When components become unavailable, graceful degradation routes data to Redis or MySQL fallback paths, and a circuit‑breaker (Hystrix) prevents cascading failures.

Rate limiting is applied at multiple levels (AppId/IP, service, interface) to protect the system from abusive calls.

4. Performance & Scalability

To support ten‑fold capacity growth, the data layer employs consistent‑hashing for Redis expansion and horizontal sharding for MySQL. Sharding follows powers‑of‑two for easier scaling, and MySQL master‑slave promotion is automated via DNS switching, reducing migration time to minutes.

5. Deployment

Storm deployments are straightforward: uploading a new JAR and restarting the topology. Kafka offsets allow the system to resume processing from the correct position after restarts. For versioned behavior records, a backup job runs older versions alongside the current one.

Real‑time user behavior system logical view

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Real-time Redis System Design Streaming Kafka MySQL Storm

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.