Backend Development 8 min read

Optimizing the Pricing Engine for High‑Volume Flight Search: Hash Sharding, Caching, and Response‑Time Improvements

This article explains how Qunar's international ticket pricing engine was optimized by redesigning hash rules for sharding, enhancing local cache replication, employing multi‑stage result delivery, and applying common distributed‑system patterns to boost computational capacity and reduce response latency for massive flight‑search workloads.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Optimizing the Pricing Engine for High‑Volume Flight Search: Hash Sharding, Caching, and Response‑Time Improvements

Zhu Shizhi

Zhu Shizhi is a senior architect at Qunar.com and technical director of international ticketing. A post‑90s engineer, he joined Qunar in 2013, contributing to public services and the international ticket search system. He built a real‑time search and pricing system with high availability, high performance, massive data volume, and high scalability, gaining extensive experience in system and program design.

In the previous article we detailed the first and second versions of the pricing engine’s overall architecture and the optimization process for data storage and dumping. This article continues with optimizations on computational load and response time, and reviews the technical points and common patterns used.

Pricing Engine Optimization

Computation Volume

Because the computation module used a local cache, we initially hashed requests by departure‑arrival‑agency dimensions to improve cache sharding and utilization. However, hotspots emerged: certain agencies handled many routes, generating many pricing‑rule lookups and inventory data, which overwhelmed a single VM’s compute capacity.

We therefore redesigned the hash rule: requests are still grouped by departure‑arrival‑agency, but each group is distributed across multiple nodes (e.g., Beijing‑HongKong‑Agency A query 1 to node 2, query 2 to node 3). This spreads hotspot calculations over several machines.

When a single machine lacks resources, a cluster is formed— the basic idea of distributed systems. Stateful partitioning is called sharding; stateless routing is load balancing. Here, insufficient memory forces local‑cache sharding, and insufficient compute forces request hashing.

Note that each shard’s local cache has multiple replicas; therefore the shard size must balance compute capacity against cache‑replica overhead. If a shard becomes too large, it degrades to round‑robin, forcing each node to hold the full dataset.

Response Time

We have maximized parallel computation and performed performance tuning, yet some calculations remain slow or depend on external data, causing the overall request to wait for the slowest sub‑computation.

To deliver results as early as possible, we adopt a multi‑batch return mechanism combined with front‑end polling of the latest results.

Multiple batch returns can be applied across several dimensions:

Separate between agencies

Separate between different business types within the same agency

Separate between different calculations within the same business type (e.g., direct flight vs. transit)

Review

Design Points of the Pricing System

On‑demand real‑time computation with result caching to balance cost and latency.

Horizontal layering and vertical channel separation to improve scalability.

Data closed‑loop to ensure data validity.

Appropriate data organization (CQRS) to reduce computation and accelerate responses.

Local‑cache design with suitable update strategies.

Custom hash rules to break single‑machine compute limits.

Multiple batch returns to deliver partial results early.

Common Patterns

Replication: stateful data backup for high availability (e.g., MMM, Redis cold backup, Canal and sync standby nodes).

Sharding: stateful data partitioning for horizontal scaling (e.g., local cache, pricing‑rule database).

CQRS: adjusting data organization.

Cache: a partial silver‑bullet, with careful update policies.

Service: modular service‑oriented design for governance, decoupling, failover, rate‑limiting, degradation, etc.

Async: increases throughput; programming model is more complex and loses some synchronous advantages such as natural flow control.

MQ: asynchronous decoupling, scalability, peak‑shaving.

NoSQL: better fit for certain scenarios—Redis for high‑concurrency I/O, Solr/Elasticsearch for fuzzy matching, HBase for write‑heavy, read‑light workloads.

Thus the entire flight‑search architecture practice is concluded. Additional topics such as serialization choices, JVM tuning, and async request deduplication are omitted but available for discussion.

Click “ Read full article ” to view “Qunar Flight Search Architecture Practice (Part 2): Pricing Engine Optimization”.

distributed systemsperformance optimizationShardingcachingflight searchpricing engine
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.