Backend Development 21 min read

How a Billion‑Scale Flight Search Engine Scales with Multi‑Level Caching and Smart Load Balancing

This article explains how a high‑throughput, low‑latency flight search system handles billions of daily requests by employing multi‑level caching, distributed Redis and MongoDB stores, AI‑driven TTL optimization, and advanced load‑balancing techniques such as pooling and overload protection.

JavaEdge

Mar 4, 2024

How a Billion‑Scale Flight Search Engine Scales with Multi‑Level Caching and Smart Load Balancing

Background

The flight search platform handles billions of queries daily, requiring sub‑second latency, high throughput, and high success rates. Core challenges are cache efficiency and long‑tail latency in real‑time computation.

Service Overview

Business Characteristics

Massive traffic with sub‑second response expectations.

Aggregation of multiple pricing engines (internal and external GDS) with differing SLAs.

Mixed compute‑intensive (price calculation, seat availability) and I/O‑intensive (external API) workloads.

Scenario‑specific result sets (e.g., student discounts).

Infrastructure Foundations

Three independent IDC sites for disaster recovery.

Technology stack: Spring Cloud, Kubernetes, cloud services (including overseas providers).

Open‑source DevOps toolchain.

Storage: MySQL, Redis, MongoDB.

Network reliability: circuit breaking, rate limiting, SRE practices.

System Architecture

Requests flow through a Gateway to front‑end services, which invoke a backend aggregation service. The aggregation service calls various engine services and pushes results to an AI data platform via Kafka for analytics and traffic replay. A cloud‑based filtering service reduces data transmission by ~90%.

Caching Architecture

Challenges & Strategies

High traffic necessitates caches to protect databases and external partners, reduce bandwidth costs, and lower latency.

Local vs Distributed Cache : Local caches suffered cold‑start delays and hit rates <5%. The system migrated to Redis‑based distributed caching with failure handling (rate limiting, circuit breaking).

TTL Management : TTL balances hit rate and freshness. Flight pricing changes rapidly, so TTL is kept short (often <5 minutes) and refreshed dynamically using ML models.

Cache Evolution

Multi‑Level Cache

Engine‑level cache.

L1 distributed aggregation cache (final query result).

L2 secondary cache (intermediate engine results).

Aggregation checks L1 first; on miss it falls back to L2, enabling fast composition.

Engine Cache Types

Result cache.

Intermediate product cache.

Base data cache.

Redis‑Based L1 Cache

High read/write performance, horizontal scalability.

Fixed TTL; trade‑off between hit rate and freshness.

Metrics: hit rate <20%, TTL <5 min, read/write latency <3 ms.

Redis‑Based L2 Cache (Upgrade)

Further performance boost, higher reliability, 90% cost reduction.

Added architectural complexity.

Result: 30% read/write performance increase.

MongoDB‑Based L2 Cache (Initial)

High read/write performance, easy TTL configuration via ML.

Operational overhead and high licensing cost.

Result: 3× throughput increase, 27% hit‑rate improvement, 20% latency reduction.

Load‑Balancing Evolution

Goals: high availability, support massive traffic, limit incident impact, improve resource utilization, and reduce long‑tail latency.

Architecture includes Gateway, load balancer, IP direct connections, and DC routing rules. A novel "Pooling" technique queues sub‑tasks; workers fetch tasks dynamically, improving CPU utilization for compute‑heavy engine calls.

Overload Protection

Drop requests waiting longer than timeout T (system becomes unavailable).

Drop requests exceeding a lower threshold X ( X<T) to keep average response time bounded (≈ X+m).

Pooling combined with overload protection reduces queue times dramatically (e.g., 10× lower at 80% load).

AI‑Enabled Optimizations

Use Cases

Smart anti‑scraping (blocks ~9% of traffic).

Query filtering to route high‑value requests to engines, low‑value to cache.

ML‑driven TTL setting.

ML Stack

Models predict optimal TTL and identify queries that benefit from multi‑ticket engines, achieving >80% request filtering and sub‑1 ms inference time.

Summary

Multi‑level flexible caching dramatically improves throughput and latency under peak traffic.

Robust scheduling and load‑balancing ensure high availability and mitigate long‑tail delays.

Targeted AI/ML techniques provide ROI gains, peak‑shaving, and better resource utilization.

Key Q&A Highlights

Cache is essential for all scenarios; each layer intercepts a portion of traffic.

Cache evolution: local → L1 Redis → L2 Redis (replacing MongoDB) driven by scalability and cost.

Distributed cache design hinges on KV strategy, sometimes embedding IP for pooling.

Redis read latency stays under 3 ms; write latency comparable.

Pooling uses Redis queues for high‑availability task dispatch.

Cache consistency managed via proactive refreshes and selective invalidation.

Overload protection discards stale requests to prevent cascading failures.

Architecture Diagrams

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems machine learning load balancing Redis Caching MongoDB flight search

Written by

JavaEdge

First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.