Project Health Metrics and Practices in Google’s SRE and Development Process
The article explains how Google measures and improves software quality before release by separating development and operations responsibilities, using monorepo and trunk‑based development, daily release candidates, automated testing, performance benchmarks, and a comprehensive Project Health (pH) metric system that balances speed, reliability, and quality.
Reading the SRE Workbook reinforces that SRE defines SLOs from the user perspective, requiring metrics for services that already provide uninterrupted user experience.
The author asks how to measure code that has not yet been released and what principles should guide this, referencing the 2020 Google Software Engineering book.
Google emphasizes shared responsibility and collaboration, dividing the software lifecycle into two parts: the development stage (handled by the Product Team) and the operations stage (handled by SRE).
Development stage (code before release) is primarily the responsibility of the Product Team.
Operations stage (code after release) is primarily the responsibility of SRE.
Google’s codebase uses a monorepo and trunk‑based development, with around 60,000 daily changelists (CLs) submitted by engineers and automation tools.
Google Cloud Platform (GCP) follows a bi‑weekly external release cadence while maintaining a daily Release Candidate internally.
Daily Release Candidate practice:
Every day automatically branch from Trunk,
Verify it meets "formal release quality".
If it does not, code quality has degraded and action is needed.
This allows faster release without extra effort.The core work principles are:
Drive Test Health
Avoid Performance Regressions
Ensure High Quality Releases
Test health requires a comprehensive automated test safety net, as described in "Continuous Delivery 2.0"; pre‑submit tests prevent bugs rather than just detect them.
Pre‑submit test purpose:
Prevent bugs, not just find them.
Build tools/platforms so developers can easily check code quality before committing.Google maintains a performance benchmark library that integrates with services to provide continuous performance monitoring.
Ensuring high‑quality releases involves separating deployment from release, using short, frequent releases, and adopting a "rollback then fix" strategy that leverages atomic CLs for easy backout.
Project Health (pH) is a metric set tracking four dimensions—developer productivity, release velocity, reliability, and quality—through indicators such as TAP test pass rate, test stability, coverage metrics, pre‑submit test coverage, skip rate, runtime, release interval, cherry‑pick count, and CL granularity.
The pH score uses the lowest sub‑score to highlight the weakest area, avoiding the misleading nature of average scores.
Google previously used Test Certified (TC) from 2008‑2016, which relied on manual evaluation and static standards; it was replaced by the automated pH system for continuous, objective assessment.
Test Certified (TC) was an internal Google certification from 2008‑2016, with over 1,700 projects registered and 1,200+ achieving levels 1‑5. It promoted testing as part of development, especially unit testing, using a five‑level standard.References include GCP’s "Move Fast and Don’t Break Things" (2019) and a 2016 discussion on Google’s Test Certified.
Promotional Note: The author also announces a limited‑time discount for the video course "Continuous Deployment Bootcamp (Python)" aimed at improving software development efficiency, quality, and release speed.
Continuous Delivery 2.0
Tech and case studies on organizational management, team management, and engineering efficiency
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.