How a Billion‑Scale Flight Search Engine Scales with Multi‑Level Caching and Smart Load Balancing
This article explains how a high‑throughput, low‑latency flight search system handles billions of daily requests by employing multi‑level caching, distributed Redis and MongoDB stores, AI‑driven TTL optimization, and advanced load‑balancing techniques such as pooling and overload protection.
Background
The flight search platform handles billions of queries daily, requiring sub‑second latency, high throughput, and high success rates. Core challenges are cache efficiency and long‑tail latency in real‑time computation.
Service Overview
Business Characteristics
Massive traffic with sub‑second response expectations.
Aggregation of multiple pricing engines (internal and external GDS) with differing SLAs.
Mixed compute‑intensive (price calculation, seat availability) and I/O‑intensive (external API) workloads.
Scenario‑specific result sets (e.g., student discounts).
Infrastructure Foundations
Three independent IDC sites for disaster recovery.
Technology stack: Spring Cloud, Kubernetes, cloud services (including overseas providers).
Open‑source DevOps toolchain.
Storage: MySQL, Redis, MongoDB.
Network reliability: circuit breaking, rate limiting, SRE practices.
System Architecture
Requests flow through a Gateway to front‑end services, which invoke a backend aggregation service. The aggregation service calls various engine services and pushes results to an AI data platform via Kafka for analytics and traffic replay. A cloud‑based filtering service reduces data transmission by ~90%.
Caching Architecture
Challenges & Strategies
High traffic necessitates caches to protect databases and external partners, reduce bandwidth costs, and lower latency.
Local vs Distributed Cache : Local caches suffered cold‑start delays and hit rates <5%. The system migrated to Redis‑based distributed caching with failure handling (rate limiting, circuit breaking).
TTL Management : TTL balances hit rate and freshness. Flight pricing changes rapidly, so TTL is kept short (often <5 minutes) and refreshed dynamically using ML models.
Cache Evolution
Multi‑Level Cache
Engine‑level cache.
L1 distributed aggregation cache (final query result).
L2 secondary cache (intermediate engine results).
Aggregation checks L1 first; on miss it falls back to L2, enabling fast composition.
Engine Cache Types
Result cache.
Intermediate product cache.
Base data cache.
Redis‑Based L1 Cache
High read/write performance, horizontal scalability.
Fixed TTL; trade‑off between hit rate and freshness.
Metrics: hit rate <20%, TTL <5 min, read/write latency <3 ms.
Redis‑Based L2 Cache (Upgrade)
Further performance boost, higher reliability, 90% cost reduction.
Added architectural complexity.
Result: 30% read/write performance increase.
MongoDB‑Based L2 Cache (Initial)
High read/write performance, easy TTL configuration via ML.
Operational overhead and high licensing cost.
Result: 3× throughput increase, 27% hit‑rate improvement, 20% latency reduction.
Load‑Balancing Evolution
Goals: high availability, support massive traffic, limit incident impact, improve resource utilization, and reduce long‑tail latency.
Architecture includes Gateway, load balancer, IP direct connections, and DC routing rules. A novel "Pooling" technique queues sub‑tasks; workers fetch tasks dynamically, improving CPU utilization for compute‑heavy engine calls.
Overload Protection
Drop requests waiting longer than timeout T (system becomes unavailable).
Drop requests exceeding a lower threshold X ( X<T) to keep average response time bounded (≈ X+m).
Pooling combined with overload protection reduces queue times dramatically (e.g., 10× lower at 80% load).
AI‑Enabled Optimizations
Use Cases
Smart anti‑scraping (blocks ~9% of traffic).
Query filtering to route high‑value requests to engines, low‑value to cache.
ML‑driven TTL setting.
ML Stack
Models predict optimal TTL and identify queries that benefit from multi‑ticket engines, achieving >80% request filtering and sub‑1 ms inference time.
Summary
Multi‑level flexible caching dramatically improves throughput and latency under peak traffic.
Robust scheduling and load‑balancing ensure high availability and mitigate long‑tail delays.
Targeted AI/ML techniques provide ROI gains, peak‑shaving, and better resource utilization.
Key Q&A Highlights
Cache is essential for all scenarios; each layer intercepts a portion of traffic.
Cache evolution: local → L1 Redis → L2 Redis (replacing MongoDB) driven by scalability and cost.
Distributed cache design hinges on KV strategy, sometimes embedding IP for pooling.
Redis read latency stays under 3 ms; write latency comparable.
Pooling uses Redis queues for high‑availability task dispatch.
Cache consistency managed via proactive refreshes and selective invalidation.
Overload protection discards stale requests to prevent cascading failures.
Architecture Diagrams
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JavaEdge
First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
