Operations 17 min read

Microsoft Azure DevOps Testing Left‑Shift: Practices, Principles, and Metrics

This article explains how Microsoft’s Azure DevOps team transformed its testing approach by shifting tests left, introducing new quality principles, redefining test classifications, improving automation reliability, and measuring progress with DevOps metrics to achieve faster, more trustworthy continuous integration and delivery.

Continuous Delivery 2.0

Jan 7, 2021

Microsoft Azure DevOps Testing Left‑Shift: Practices, Principles, and Metrics

Background

In the previous article we discussed Microsoft’s "Testing Right‑Shift" (TIP). Some colleagues argued that Windows 10’s quality issues indicated problems with that practice.

However, this series focuses on the Azure DevOps (formerly VSTS) team’s approach; Windows itself is not a cloud service product.

Testing Left‑Shift Overview

We now describe Microsoft’s testing left‑shift. While some content overlaps with earlier posts, new practices and principles are introduced.

How We Used to Work

In September 2014, three years into the cloud era, we still followed pre‑cloud testing methods, trying to speed up tasks and optimize automation but constantly struggling.

Problems with Automation

Our automated test suite took too long. The Nightly Automation Run (NAR) required 22 hours, and the Full Automation Run (FAR) took two days. Tests frequently failed, producing large numbers of false failures that were too costly to triage, leading teams to ignore failures before sprint end.

We focused on keeping a small set of high‑priority (P0) tests reliable, achieving about 70% pass rate, but still faced failures from infrastructure, product issues, and test defects.

Feedback from master‑branch validation arrived 12 hours after a commit, making it hard to act on failures before the sprint closed, often delaying releases by weeks.

New Quality Vision

In February 2015 we published a new Azure DevOps quality vision, redesigning the test suite from the ground up, with a layered model (L0/L1 unit tests, L2/L3 functional tests).

Testing Principles

Write tests at the lowest possible level

Prefer tests with minimal external dependencies, running as part of the build. If a unit test (L0) can provide the needed information, avoid functional tests (L2/L3).

Write once, run everywhere, including production

Avoid tests that depend on a custom test server (Object Model) or internal knowledge; functional tests should use only public APIs, not back‑doors.

Design for testability

Embed testability into product design so that most tests can be unit tests; treat test code with the same rigor as production code.

Test code is production code

Test code must be reliable, reviewed, and maintained like product code; neglecting test code quality undermines confidence.

Testing infrastructure is a shared service

Testing should be integrated into the build pipeline, run under Visual Studio Team Explorer, and be as reliable as product code.

Test ownership aligns with product ownership

Developers own tests for their components; they should not rely on others to test their code.

Testing Left‑Shift in Practice

Quality signals are generated earlier, often before code merges to master. Most tests run before a change reaches the main branch.

Re‑classifying Tests

We introduced a new classification based on external dependencies:

L0/L1 – Unit Tests

L0: fast, memory‑only tests (< 60 ms). L1: may depend on SQL or file system (< 400 ms, max 2 s).

L2/L3 – Functional Tests

L2: runs on testable service deployments, with limited dependencies. L3: full integration tests running on production‑like environments.

Ensuring Isolation for Functional Tests

L2 tests must be isolated, controlling their environment fully to avoid cross‑test interference. We built a fake identity provider to replace external authentication services.

Metrics and Progress

We track a “North Star” metric per iteration, showing a reduction from 27 000 legacy tests to fewer than 14 000 by iteration 101, with a growing number of L0/L1 unit tests.

Key milestones:

PR to merge: ~30 minutes, running ~60 000 unit tests.

Merge to CI build: ~22 minutes.

First quality feedback from CI: ~1 hour.

Full test cycle to self‑hosted environment: < 2 hours.

DevOps Metrics Used

We maintain a team scorecard tracking two metric families:

MTT(x) : time to detect and mitigate production issues, and time to ship a fix.

Project Health : number of unresolved defects per engineer; if >5, defect remediation is prioritized over new features.

We also monitor engineering speed by measuring CI/CD pipeline stages.

All content is derived from a Microsoft 2017 presentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CI/CD Automation testing quality metrics Azure DevOps Left Shift

Written by

Continuous Delivery 2.0

Tech and case studies on organizational management, team management, and engineering efficiency

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.