Improving Ctrip's AB Experiment Splitter: Design, Performance Optimization, and Backend Architecture

The article details Ctrip's challenges with multiple AB testing splitters, presents performance gains after migrating to a new splitter, and explains the comprehensive redesign covering overall architecture, interface consolidation, SDK slimming, and a custom distributed cache backend to achieve higher throughput and lower latency.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Improving Ctrip's AB Experiment Splitter: Design, Performance Optimization, and Backend Architecture

Background: Ctrip has been an early adopter of AB testing, using multiple AB splitter interfaces across apps, mini‑programs, and online pages, leading to issues such as interface confusion, degraded response efficiency under high traffic, and tight coupling with experiment configuration tables.

Improvement results: After migrating most traffic to a new AB splitter via the slbportal tool, QPS increased from 200.7 to 290.2 and P99.9 latency dropped dramatically from 363.1 ms to 5.2 ms, demonstrating significant performance gains.

Improvement plan includes four parts:

Overall design: Choosing a service‑based architecture but ultimately adopting a resident SDK to distribute splitting logic across departments and maximize efficiency.

Consolidation (收口): Reducing dozens of department‑specific splitter endpoints to one or two unified interfaces, simplifying development and maintenance.

SDK redesign: Transforming the “fat” SDK into a “thin” SDK that only holds a minimal wide‑table of essential experiment fields, moving heavy queries to the backend and introducing a CopyOnWrite cache for rapid rule updates.

Backend selection: Rejecting qconfig due to scalability limits, evaluating Redis, and finally implementing a custom distributed cache based on Apache Ignite with a snapshot service that provides read‑write separation, real‑time updates, and high availability.

The backend architecture consists of an SOA service layer, the Ignite‑based distributed cache, and the experiment configuration database, with snapshot and real‑time update services ensuring consistency and low latency.

Post‑implementation observations note stable performance during disaster‑recovery drills, resolved issues with unique identifier control and snapshot service deployment, and remaining work such as real‑time cache monitoring and alerting.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

backend designPerformance OptimizationAB testingdistributed cacheCtripsplitter
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.