How We Boosted CI Automation Efficiency by 50%: A Real‑World Case Study
This article details how a development team identified performance, stability, and metric shortcomings in their automated CI pipeline and implemented architectural, process, and tooling improvements—including GPU‑enabled Linux runners, dynamic thread pools, result noise reduction, and comprehensive dashboards—to dramatically increase test throughput, reliability, and overall automation value.
1. Background Challenges
Automated continuous integration (CI) is widely adopted, promising faster testing, reduced manual work, and confidence in releases, but the team faced several issues that caused the actual outcomes to diverge from expectations.
As business complexity grew, automation execution slowed, resources became scarce, queue times lengthened, and each release was delayed. Stability problems—script, platform, and environment failures—eroded trust in automation results, leading to unreproducible failures and ignored CI checkpoints. Over‑optimizing automation metrics resulted in negative‑EV investments, and CI checkpoints became ineffective, with many automated cases running daily yet still missing critical defects.
2. Solution Overview
To restore the true value of automated CI, the team tackled pain points across CI workflow, infrastructure, and platform capabilities, aiming to maintain stable automation while improving efficiency, CI checkpoint effectiveness, and reducing investigation costs, all measured by clear metrics.
2.1 Existing Architecture Overview
The CI architecture uses internal DevOps platforms (named Pub and Moon) for front‑end and back‑end builds and releases. Various automation types—API tests, UI tests, unit tests, mutation tests, static scans—are scheduled via Jenkins, with most Jenkins slaves managed by Kubernetes. The focus of the improvement is on API and UI automation.
3. Implementation Details
3.1 Improving Execution Efficiency
Automation tasks (code changes, deployments, scheduled jobs) trigger various test suites; UI automation is the slowest, followed by API automation and unit tests. The target is to keep each task under 30 minutes.
3.1.1 UI Automation
Initially, UI tests ran on Windows machines with limited resources, causing average queue times of ~30 minutes and execution times over an hour per task. The solution was to migrate to GPU‑enabled Linux physical machines, run tests headlessly, and allocate resources (8 vcuda‑core, 3 vcuda‑memory per pod). Two machines now support up to 96 concurrent tasks. Concurrency was further increased by running tests in parallel at the file level.
After a half‑year migration, most UI cases run smoothly on Linux; a small subset still requires Windows due to rendering issues and is handled via test case tagging and mixed‑environment execution.
Stage
Daily Runs
Queue Time
Execution Time
Before Optimization
300
30 min
60 min
After Optimization
2500‑3000
30 s
15 min
3.1.2 API Automation
API tests require building test environments, which can be time‑consuming. The team caches builds for identical branches/commits, covering ~70 % of tasks. To balance load, they switched from a single large thread pool to multiple smaller pools with dynamic scaling, achieving better throughput without overloading services.
Strategy
Details
Pros/Cons
Single 12‑thread pool
All CI tasks share one pool
FIFO ordering
Adjustable thread count
Fast execution per task
Increases service load
Fixed size requires manual intervention when backlog occurs
Thread‑pool groups (3×6 threads, 1.5× elasticity)
Tasks routed to specific pools
FIFO ordering per pool
Auto‑scaling per pool
Adjustable group count
Fast execution per task
Same load‑scaling trade‑offs as above
3.2 CI Process Optimization
The team introduced left‑shift and right‑shift testing, traffic replay, and log inspection to increase automation value.
3.2.1 Left‑Shift Testing
Attempting to run tests at merge‑request (MR) stage revealed issues: frequent MR runs blocked pipelines, developers were reluctant to investigate test failures, and MR code stability was low. The solution was to shift automation to the “test‑request” stage, where testers trigger smoke tests and can intervene when automation fails.
3.2.2 Automated Issue Creation
Automation results now generate issues or actionable items. To reduce noise, the team added a result‑confirmation step and smart analysis before creating issues, supporting both issue‑based and item‑based workflows to accommodate different agile teams.
3.2.3 Result Notification & Dashboards
Automation outcomes are pushed daily to personal, team, and manager dashboards, providing visibility into failures, trends, and key metrics.
3.3 Automation Stability Governance
3.3.1 Result Stability
To improve result reliability, the team applies noise reduction and intelligent analysis. After retries, persistent failures are re‑run on a stable baseline; matching results are treated as noise, otherwise they are reported as bugs. Additional business‑specific rules further filter out non‑actionable failures.
3.3.2 Failure Investigation
For failed API tests, the system records service snapshots, warnings, health status, and logs, as well as code coverage via JaCoCo. For UI failures, screenshots, video recordings, console errors, and network logs are captured.
3.3.3 Jenkins Stability
Jenkins, the core CI scheduler, faced issues as daily automation volume grew to >10 k runs, including full GC pauses, disk space exhaustion, and network bottlenecks. Solutions included limiting retained build history, offloading logs to object storage, and avoiding unnecessary plugins.
3.4 Metric Measurement
The team defined a three‑level metric hierarchy: primary result metrics (automation fault discovery rate and omission rate), secondary decomposition metrics (coverage, checkpoint interception, key‑scenario coverage), and tertiary improvement metrics (specific actions to boost the primary metrics). These metrics drive continuous improvement without becoming a punitive KPI.
4. Outcomes & Future Plans
After implementing the improvements, the team achieved a 50 % reduction in overall execution time, lowered failure rates from 17 % to 2 %, and increased automation issue effectiveness from 40 % to 70 %. Future work includes integrating all internal regression platforms into CI, expanding left‑ and right‑shift testing, and exploring AI‑assisted test generation, intelligent result analysis, and automated remediation to further boost testing efficiency and software quality.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
