Operations 12 min read

Google’s Project Health Metrics and Practices for Pre‑Release Code Quality

The article explains how Google measures and maintains software quality before release by dividing responsibilities between product teams and SRE, using monorepo, trunk‑based development, daily release candidates, automated testing, performance monitoring, and a Project Health (pH) metric system that tracks productivity, release velocity, reliability, and quality.

Continuous Delivery 2.0
Continuous Delivery 2.0
Continuous Delivery 2.0
Google’s Project Health Metrics and Practices for Pre‑Release Code Quality

Google Code Basics

Unlike most companies, Google uses a monorepo combined with trunk‑based development, resulting in over 60,000 daily code submissions (both human and automated), each called a changelist (CL).

Google Cloud Platform Release Cadence

Google Cloud Platform (GCP) follows a bi‑weekly public release schedule, while internally it strives for a daily release candidate (Daily RC) process.

Daily Release Candidate Practice:

    Every day a branch is automatically cut from Trunk,
    and verified to meet "production‑grade" quality.
    If it fails, code quality has regressed and action is required.
    This allows faster release cadence without extra effort.

Working Principles

Google balances speed and quality through three core principles:

Drive Test Health

Avoid Performance Regressions

Ensure High‑Quality Releases

These principles form the foundation of all measurement indicators.

Test Health

Continuous Delivery 2.0 stresses that no single technique can guarantee software quality; a comprehensive automated test safety net is required for fast, high‑frequency delivery.

Google’s automated quality guardrail resembles a deployment pipeline, and the rise of micro‑services makes testing more challenging.

To promote test health, Google focuses on pre‑submit fast tests (often called "gate checks"), closed‑loop integration tests, and eliminating flaky tests.

Test cases should be "write once, run everywhere" so they can be used on developers’ machines, in release candidate validation, and in production.

Pre‑submit fast tests aim to prevent bugs rather than just find them.
Build tools and platforms that let developers easily check code quality before committing.

Avoid Performance Regressions

Performance is treated as a first‑class citizen at GCP.

Google maintains an internal performance benchmark library that product teams can integrate to obtain rich performance baselines, and performance is continuously monitored as a service.

Ensure High‑Quality Releases

Balancing speed and quality means releasing features quickly while minimizing user impact.

Key practices include:

Separate deployment from release (binary deployment vs. configuration rollout) so unfinished features can be deployed but stay hidden until toggled on.

Frequent, short‑interval releases to keep everyone aware of trunk quality standards and reduce pressure on developers.

Rollback first, then fix – if a problem appears, revert to the last good version; because CLs are atomic, problematic CLs can be back‑out from trunk easily.

Project Health (pH)

Project Health (pH) is a set of metrics that monitor the health of software projects during the development phase before release.

It reflects Google’s development philosophy of sustainable, small‑batch, high‑quality, rapid releases, and aligns with the shared‑responsibility and collaborative principles described earlier.

Higher pH scores indicate better balance of speed and quality, fewer bugs, faster bug fixes, and more time for new feature development.

pH Metrics and Levels

The pH system tracks four main dimensions: developer productivity, release velocity, reliability, and quality.

Specific indicators include:

TAP test pass rate

TAP test flakiness

Overall test coverage

Incremental test coverage

Pre‑submit test coverage

Pre‑submit test skip rate

Pre‑submit test runtime

Release interval

Number of cherry‑picks on release branch

Granularity of each release (number of CLs)

Project rating uses the "lowest score" approach rather than an average, because the lowest score highlights the weakest area that needs attention.

TC vs. pH

Google previously used an eight‑year Test Certified (TC) program (2008‑2016) to assess testing health, but it was replaced by Project Health because TC relied on static standards, required manual evaluation, and many projects stalled at mid‑level.

TC metrics were hard to measure and required costly manual evidence.

pH can be calculated automatically, updated in real time, and helps the Engineering Productivity team target assistance.

Test Certified (TC) was an internal Google certification from 2008 to 2016.
It involved 1,700+ registered projects, 1,200+ receiving levels 1‑5, and 578 mentors.
TC used five levels to define test health and encouraged developers to treat testing as part of development, especially unit testing.

References:

GCP: Move Fast and Don’t Break Things, 2019

Discussion on Google’s Test Certified, 2016

testingmetricsSREcontinuous deliveryGoogleProject Health
Continuous Delivery 2.0
Written by

Continuous Delivery 2.0

Tech and case studies on organizational management, team management, and engineering efficiency

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.