Operations 22 min read

Design and Optimization of Testing Environment 3.0 at Qunar Travel

This article describes how Qunar Travel has evolved its testing environment governance from a fixed 10‑machine setup to a template‑driven, soft‑routing architecture (Environment 3.0), improving delivery speed, reliability, business connectivity, and reducing operational costs through automated sync, smart recommendations, and continuous business checks.

Qunar Tech Salon

Aug 17, 2022

Design and Optimization of Testing Environment 3.0 at Qunar Travel

1. Background

Qunar Travel has placed great emphasis on testing‑environment governance to improve developer and tester efficiency. Since 2018, three rounds of environment governance and optimization have been carried out.

1.1 Test Environment 1.0

Environment 1.0 consisted of ten fixed machines, each hosting a full‑stack environment. As micro‑services grew, manual maintenance on beta machines became difficult and resource contention reduced efficiency.

1.2 Test Environment 2.0

Environment 2.0 introduced the internally developed Noah environment‑management system, which uses templating to manage micro‑service applications. Creating a new environment now takes about 30 minutes for a hotel environment with over 200 services, dramatically improving delivery speed and reducing contention.

1.3 Test Environment 3.0

After a 2021 efficiency assessment, three main pain points were identified: long debugging time, frequent service crashes, and still‑slow environment creation.

These were categorized into delivery efficiency, basic reliability, and business connectivity.

2. Environment 3.0 Design

2.1 Improving Delivery Efficiency

2.1.1 Analysis

In Environment 2.0, hotel‑scale environment creation already reached industry‑leading speed (200 modules in 30 minutes). Two approaches were considered for further improvement: reducing per‑module build time or reducing module scale.

Reducing module scale was chosen: projects only pull the modules they need, shortening creation time and increasing resource utilization.

2.1.2 Soft‑Routing Mechanism

The soft‑routing solution consists of two core functions: environment binding and traffic distribution.

2.1.2.1 Environment Binding

Users bind a UID to an environment via the Noah binding tool, storing the relationship. The gateway reads this binding, injects the environment identifier into HTTP headers, and forwards the request.

2.1.2.2 Traffic Distribution

Traffic distribution involves OR, Dubbo, and QMQ middleware. The process includes service perception (registering environment info) and service selection (routing based on environment identifiers, falling back to the baseline environment when no match is found).

2.1.2.3 Middleware Refactoring

Gateway : Logical isolation using routing identifiers; parses routes and forwards to the appropriate environment.

Dubbo : Providers register with a routerId parameter; consumers select providers based on the environment identifier.

QMQ : Consumers register and publish with environment identifiers; the server filters messages accordingly.

2.1.2.4 Storage Isolation

Physical isolation is used for databases and Redis. An intelligent recommendation engine selects required data stores based on module dependencies, ensuring data isolation and stable test chains.

3. Test Environment Stability

3.1 Basic Reliability

Two reliability aspects are addressed: baseline environment reliability and soft‑routing environment reliability.

3.1.1 Baseline Environment

The baseline environment runs the latest stable code, aims for 99 % availability, and employs multi‑machine deployment, disables debug, and follows a “baseline environment guarantee plan” covering code, configuration, and database synchronization.

Code sync : Hourly sync of online releases to the baseline environment.

Configuration sync : Chosen strategy synchronizes test‑environment configuration (safer than syncing online config).

Database sync : Hourly table‑structure sync with security‑masked data.

3.1.2 Soft‑Routing Environment

Since soft‑routing environments may contain unstable code, the strategy focuses on proactive detection and rapid localization. Noah provides environment health probes, container auto‑healing, and VM restart mechanisms, notifying administrators when automatic recovery fails.

3.2 Business Connectivity

Business connectivity verifies that developers and testers can use the environment out‑of‑the‑box. Scenarios include intra‑template, intra‑business‑line, and cross‑business‑line interactions, with soft‑routing ensuring requests reach the correct modules or fall back to the baseline.

The “Scout” system implements automated business checks (≈1800 checks per day) covering 300+ test cases. Checks are triggered by scheduled jobs, CI/CD pipelines, and environment creation/updates, reducing issue‑resolution time from ~30 minutes to ~30 seconds.

4. Environment 3.0 Metrics

4.1 Delivery Efficiency

Average module count dropped from ~200 to ~10 (≈90 % reduction), and environment creation time decreased by 70 %.

4.2 Business Check Pass Rate

Core environment count reduced from 60+ to a single baseline, enabling faster issue resolution and higher reliability.

4.3 Maintenance Cost Reduction

Fewer environments lower operational overhead and resource consumption.

5. Future Outlook

5.1 Environment 4.0 (PROD → BETA)

Introduce richer synchronization mechanisms and more comprehensive business checks to make test environments easier to use and maintain.

5.1.1 Application Layer

Automatic application and machine configuration sync.

Automatic code sync.

Enhanced self‑healing.

5.1.2 Data Layer

Database and cache synchronization with security‑masked data.

5.1.3 Infrastructure Layer

Automatic configuration sync.

Automatic gateway sync.

5.1.4 Business Check Automation

Automated case recording/replay to increase coverage and intelligent failure analysis for self‑healing.

5.2 Environment 5.0 (Test in Production)

Explore “Test in Production” by extending the hotel simulation environment with cloud and soft‑routing techniques, providing per‑developer isolated environments that mirror production while maintaining logical isolation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Microservices Operations testing Reliability Environment

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

1. Background

1.1 Test Environment 1.0

1.2 Test Environment 2.0

1.3 Test Environment 3.0

2. Environment 3.0 Design

2.1 Improving Delivery Efficiency

2.1.1 Analysis

2.1.2 Soft‑Routing Mechanism

2.1.2.1 Environment Binding

2.1.2.2 Traffic Distribution

2.1.2.3 Middleware Refactoring

2.1.2.4 Storage Isolation

3. Test Environment Stability

3.1 Basic Reliability

3.1.1 Baseline Environment

3.1.2 Soft‑Routing Environment

3.2 Business Connectivity

4. Environment 3.0 Metrics

4.1 Delivery Efficiency

4.2 Business Check Pass Rate

4.3 Maintenance Cost Reduction

5. Future Outlook

5.1 Environment 4.0 (PROD → BETA)

5.1.1 Application Layer

5.1.2 Data Layer

5.1.3 Infrastructure Layer

5.1.4 Business Check Automation

5.2 Environment 5.0 (Test in Production)

Qunar Tech Salon

How this landed with the community

Was this worth your time?

0 Comments

4. Environment 3.0 Metrics

5.1 Environment 4.0 (PROD → BETA)

5.2 Environment 5.0 (Test in Production)