Operations 8 min read

A Simple Gray Release Solution for High‑Concurrency Flight Ticket Systems

This article presents a lightweight gray release approach for complex flight ticket services, comparing traditional hardware and soft‑routing isolation methods, describing the authors' traffic‑based gray identification, business‑focused monitoring, implementation details, and automated safeguards to enable safe incremental deployments.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
A Simple Gray Release Solution for High‑Concurrency Flight Ticket Systems

1 Background

Gray release is an industry practice to mitigate release risk; typical approaches require isolated gray environments, either hardware or software isolation, which are costly to implement and maintain. Due to the complexity of flight ticket business, release‑induced failures are frequent, creating an urgent need for a simple, feasible gray release solution.

2 Common Industry Solutions

To ensure evaluability, gray environments must be isolated, mainly achieved by two methods.

Gray Machine Isolation

Implementation: physically isolate an entire gray environment and handle online traffic.

Figure 1

Advantages: minimal application changes, supports monitoring isolation, traffic routing, manual verification.

Disadvantages: high implementation and maintenance cost.

Soft Routing Traffic Isolation

Implementation: use soft load‑balancing logic to isolate the gray environment.

Figure 2

Advantages: no extra cost, supports monitoring isolation, traffic routing, manual verification.

Disadvantages: large application changes, high risk.

3 Our Solution

Given the unmaintainable nature of hardware isolation in complex systems, we mainly consider soft routing isolation, but with many differences.

3.1 Thought Process

Goal: expose release risk with minimal traffic.

Define "small traffic" by user, route, or random proportion such as 1%, 5%, 10%.

Thus we consider routing traffic to designated environments; our answer is to treat traffic flowing through gray machines as small traffic.

Figure 3

We then identify whether gray‑traffic business status is normal via business monitoring. By separating gray‑traffic monitoring, fluctuations indicate release health.

Figure 4

Figure 5

Core monitoring includes business volume monitoring (Figure 4) and business result monitoring (Figure 5); gray release should focus on business result monitoring.

Two theoretical foundations:

1. Gray traffic is identified by flow through specific machines.

2. Monitoring emphasizes business result metrics.

3.2 Overall Solution Formed

Figure 6

The solution requires only about 0.5 person‑day to enable gray release capability.

Gray release process (Figure 7): release must target gray machines first, and full release proceeds only if gray monitoring is normal.

Figure 7

Automation monitors gray release health; if metrics fail, a warning is issued and full release is blocked.

3.3 Principle Introduction

How is the gray‑traffic identifier transmitted between systems?

Downward transmission uses a global trace component.

Upward transmission (protocol specific): Dubbo – attachment; HTTP – header; MQ – not needed.

Within a system, the identifier is stored in a global trace memory.

Figure 9

To avoid memory leak or OOM, we enforce total amount control (default max concurrent 5124) and timeout cleanup (default 60 s, although RPC tolerance is 30 s).

We cannot manage lifecycle via request start/end because requests may be sync, async, callbacks, MQ, etc.

Gray traffic monitoring isolation uses the monitoring system’s tag feature; gray traffic is tagged accordingly.

Figure 10

4 Solution Summary

End

BackendMonitoringoperationsDeploymentgray releasehigh concurrency
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.