Design and Optimization of Testing Environment 3.0 at Qunar Travel
This article describes how Qunar Travel has evolved its testing environment governance from a fixed 10‑machine setup to a template‑driven, soft‑routing architecture (Environment 3.0), improving delivery speed, reliability, business connectivity, and reducing operational costs through automated sync, smart recommendations, and continuous business checks.
1. Background
Qunar Travel has placed great emphasis on testing‑environment governance to improve developer and tester efficiency. Since 2018, three rounds of environment governance and optimization have been carried out.
1.1 Test Environment 1.0
Environment 1.0 consisted of ten fixed machines, each hosting a full‑stack environment. As micro‑services grew, manual maintenance on beta machines became difficult and resource contention reduced efficiency.
1.2 Test Environment 2.0
Environment 2.0 introduced the internally developed Noah environment‑management system, which uses templating to manage micro‑service applications. Creating a new environment now takes about 30 minutes for a hotel environment with over 200 services, dramatically improving delivery speed and reducing contention.
1.3 Test Environment 3.0
After a 2021 efficiency assessment, three main pain points were identified: long debugging time, frequent service crashes, and still‑slow environment creation.
These were categorized into delivery efficiency, basic reliability, and business connectivity.
2. Environment 3.0 Design
2.1 Improving Delivery Efficiency
2.1.1 Analysis
In Environment 2.0, hotel‑scale environment creation already reached industry‑leading speed (200 modules in 30 minutes). Two approaches were considered for further improvement: reducing per‑module build time or reducing module scale.
Reducing module scale was chosen: projects only pull the modules they need, shortening creation time and increasing resource utilization.
2.1.2 Soft‑Routing Mechanism
The soft‑routing solution consists of two core functions: environment binding and traffic distribution.
2.1.2.1 Environment Binding
Users bind a UID to an environment via the Noah binding tool, storing the relationship. The gateway reads this binding, injects the environment identifier into HTTP headers, and forwards the request.
2.1.2.2 Traffic Distribution
Traffic distribution involves OR, Dubbo, and QMQ middleware. The process includes service perception (registering environment info) and service selection (routing based on environment identifiers, falling back to the baseline environment when no match is found).
2.1.2.3 Middleware Refactoring
Gateway : Logical isolation using routing identifiers; parses routes and forwards to the appropriate environment.
Dubbo : Providers register with a routerId parameter; consumers select providers based on the environment identifier.
QMQ : Consumers register and publish with environment identifiers; the server filters messages accordingly.
2.1.2.4 Storage Isolation
Physical isolation is used for databases and Redis. An intelligent recommendation engine selects required data stores based on module dependencies, ensuring data isolation and stable test chains.
3. Test Environment Stability
3.1 Basic Reliability
Two reliability aspects are addressed: baseline environment reliability and soft‑routing environment reliability.
3.1.1 Baseline Environment
The baseline environment runs the latest stable code, aims for 99 % availability, and employs multi‑machine deployment, disables debug, and follows a “baseline environment guarantee plan” covering code, configuration, and database synchronization.
Code sync : Hourly sync of online releases to the baseline environment.
Configuration sync : Chosen strategy synchronizes test‑environment configuration (safer than syncing online config).
Database sync : Hourly table‑structure sync with security‑masked data.
3.1.2 Soft‑Routing Environment
Since soft‑routing environments may contain unstable code, the strategy focuses on proactive detection and rapid localization. Noah provides environment health probes, container auto‑healing, and VM restart mechanisms, notifying administrators when automatic recovery fails.
3.2 Business Connectivity
Business connectivity verifies that developers and testers can use the environment out‑of‑the‑box. Scenarios include intra‑template, intra‑business‑line, and cross‑business‑line interactions, with soft‑routing ensuring requests reach the correct modules or fall back to the baseline.
The “Scout” system implements automated business checks (≈1800 checks per day) covering 300+ test cases. Checks are triggered by scheduled jobs, CI/CD pipelines, and environment creation/updates, reducing issue‑resolution time from ~30 minutes to ~30 seconds.
4. Environment 3.0 Metrics
4.1 Delivery Efficiency
Average module count dropped from ~200 to ~10 (≈90 % reduction), and environment creation time decreased by 70 %.
4.2 Business Check Pass Rate
Core environment count reduced from 60+ to a single baseline, enabling faster issue resolution and higher reliability.
4.3 Maintenance Cost Reduction
Fewer environments lower operational overhead and resource consumption.
5. Future Outlook
5.1 Environment 4.0 (PROD → BETA)
Introduce richer synchronization mechanisms and more comprehensive business checks to make test environments easier to use and maintain.
5.1.1 Application Layer
Automatic application and machine configuration sync.
Automatic code sync.
Enhanced self‑healing.
5.1.2 Data Layer
Database and cache synchronization with security‑masked data.
5.1.3 Infrastructure Layer
Automatic configuration sync.
Automatic gateway sync.
5.1.4 Business Check Automation
Automated case recording/replay to increase coverage and intelligent failure analysis for self‑healing.
5.2 Environment 5.0 (Test in Production)
Explore “Test in Production” by extending the hotel simulation environment with cloud and soft‑routing techniques, providing per‑developer isolated environments that mirror production while maintaining logical isolation.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.