Applying the VALET Pattern Language for SRE Transformation at Home Depot (THD)
The article explains how Home Depot (THD) adopted the VALET pattern language—Volume, Availability, Latency, Error, and Ticket—to unify service‑level objectives, automate data collection, build dashboards, and improve SRE practices across its massive retail and e‑commerce infrastructure.
This article, originally from Chapter 3 of the English edition of the SRE Handbook , describes how Home Depot (THD), the world’s largest home‑improvement retailer, used the VALET pattern language to drive its SRE transformation.
VALET stands for Volume, Availability, Latency, Error, and Ticket, each representing a key SLO dimension that teams must answer for dependent services.
Volume – how much traffic the service can handle
Availability – can the service be started on demand
Latency – does the service respond quickly enough
Error – does the service raise errors
Ticket – does the service require manual intervention
Initially, THD’s monitoring tools and dashboards were fragmented, making incident diagnosis time‑consuming and causing miscommunication between development and operations teams.
To create a common language, THD introduced a unified SLO framework based on VALET, incorporated it into developers’ OKRs, and built an automated data‑collection pipeline called the “TPS report.” This pipeline captures VALET metrics from logs stored in BigQuery, integrates data from other monitoring systems (e.g., Stackdriver), and stores the results in a Cloud SQL database.
The VALET dashboard visualizes these metrics, allowing users to register new services, set SLO targets for any VALET category, and add custom metric types (e.g., P99 latency, daily transaction volume). The dashboard also supports slicing and dicing data across services, generating weekly or monthly SLO reports, and feeding alerts to chat bots.
THD extended VALET to batch processing workloads by redefining the five categories (e.g., “Capacity” for record volume, “Availability” as percentage of successful runs, “Latency” as job runtime, “Error” as failed records, and “Ticket” as manual fixes). This adaptation enables SLO‑driven reliability for both real‑time services and batch jobs.
The article concludes with an open organizational challenge: translating VALET metrics into business terms that product managers can readily understand, thereby aligning product and engineering goals around shared SLOs.
Continuous Delivery 2.0
Tech and case studies on organizational management, team management, and engineering efficiency
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.