System Slimming at Qunar Travel: Reducing Code and Service Footprint by 50% Using Observability and Automation
This article presents Qunar Travel's "system slimming" project, describing how observability techniques, a two‑stage strategy, and automated tooling were used to identify and remove unused services and code, achieving a 50% reduction in code size, a 26% cut in services, and measurable improvements in reliability and release efficiency.
Background: Over time, legacy systems accumulate technical debt such as abandoned modules and dead code, which bloats the system and hampers new feature development and maintenance.
Qunar Travel launched a "system slimming" initiative in early 2022 to cut down unused code and services without affecting production stability. The goal was to reduce both code and service counts by 50%, with code reduction being the primary metric.
Goal Setting and Strategy: The project adopted a two‑stage approach—first trimming services, then shrinking code. Targets were ambitious, requiring the removal of tens of millions of lines of dead code across all business lines (flights, hotels, tickets, etc.).
Planning: Time was split into two months for service reduction followed by code reduction. An internal "slim‑support" team was created to provide common tools and technical assistance to each business line.
Selection Strategies: Two main strategies were defined – a two‑stage method ("findable" and "deletable") and a four‑step filtering model (feature extraction, measurement, data collection, matching). These guided the identification of low‑value services and code.
Service Slimming Automation: Low‑value services were identified by lack of traffic (north‑south, east‑west, internal) and absence of recent updates. A four‑phase deletion workflow (confirmation, pre‑recovery, observation, recovery) was established and automated via a service‑slimming platform.
Code Slimming Techniques: Three scenarios for dead code were addressed – unused methods detected by static analysis, runtime‑identified methods with no traffic, and refactoring to eliminate duplication. Measurement options included AOP logging, bytecode instrumentation via an Agent, and the Serviceability Agent (SA) tool.
Tool Comparison: The three measurement approaches were evaluated on performance impact, failure risk, and implementation complexity. The SA tool was chosen for zero performance loss and zero failure risk after deep optimization.
Implementation Details: SA runs a short "run‑numbers" phase that records method invocation counts. By aggregating data over weeks or months, methods with zero calls are identified. The process can be scheduled during service downtime or tied to release/restart events to avoid impact.
Automation Levels: Initially a fully automated deletion pipeline was built, but teams hesitated due to lack of control. A semi‑automated approach using an IntelliJ plugin was later introduced, allowing developers to manually approve deletions while still benefiting from automated detection.
Results: The project achieved a 50% reduction in code lines and a 26% reduction in services, with fault rates dropping below 0.3% and release efficiency improving by 9.5%. Average demand processing time decreased by 10.9%.
Conclusion: Even modest low‑level technical improvements, such as systematic dead‑code removal using observability data, can deliver substantial business value, enhancing system stability, developer productivity, and overall operational cost.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.