Operations 9 min read

Applying the VALET Model for SRE Transformation at Home Depot (THD)

The article explains how Home Depot (THD) adopted the VALET model—a five‑dimensional SLO language covering Volume, Availability, Latency, Error, and Ticket—to unify communication, automate data collection, and improve reliability across its massive retail and e‑commerce infrastructure.

Continuous Delivery 2.0
Continuous Delivery 2.0
Continuous Delivery 2.0
Applying the VALET Model for SRE Transformation at Home Depot (THD)

The source material, originally from Chapter 3 of the English edition of the SRE Handbook , describes how Home Depot (THD), the world’s largest home‑improvement retailer, used the VALET model during its Site Reliability Engineering (SRE) transformation.

VALET Definition

V olume – traffic capacity

A vailability – ability to start the service on demand

L atency – response speed of the service

E rror – occurrence of errors during use

T icket – need for manual intervention to complete a request

By answering these five questions for each dependent service, teams gain a transparent, consistent view of reliability expectations.

1. Original State

Home Depot’s monitoring tools and dashboards were fragmented, making root‑cause analysis time‑consuming. Planned outages were often unknown to dependent services, and SLOs (e.g., 99.9%) were set without visibility into whether upstream services could meet stricter targets such as four‑nines.

2. Establishing a Common Language

The company introduced a unified language—VALET—to standardize metrics (traffic, waiting time, errors, utilization) and to use support‑ticket volume as a customer‑facing reliability indicator.

3. Automatic VALET Data Collection

Home Depot built a framework called the “TPS Report” that automatically captures VALET data from services deployed in the cloud. Logs are streamed to BigQuery, where they are combined with other monitoring sources (e.g., Stackdriver probes) and transformed into hourly VALET metrics. The data are stored in a Cloud SQL database and can be queried, visualized, or accessed via a chat‑bot.

4. VALET Service

A dedicated VALET application stores and reports SLO data, aggregating alerts from various monitoring platforms for trend analysis. Although alert thresholds are not directly tied to SLOs, the service allows flexible adjustments.

VALET Dashboard

The dashboard (see image) visualizes VALET metrics, enabling users to register new services, set SLO targets for any of the five VALET categories, and add custom metric types (e.g., P99 latency, daily transaction volume). It supports slicing and dicing data, weekly/monthly SLO reviews, and generating operational action items.

5. Applying VALET to Batch Processing

The same VALET dimensions are adapted for batch jobs: Capacity (records processed), Availability (percentage of jobs completed on time), Latency (job runtime), Error (records that failed), and Ticket (manual interventions required).

6. Communicating SLOs to Product Managers

While engineers find VALET intuitive, translating its metrics into business terms for product managers remains a challenge. Bridging this gap is essential to align expectations and reduce reliability mismatches in large organizations.

7. Next Organizational Challenges

The article concludes by highlighting the need to further integrate VALET concepts across product and engineering teams, ensuring shared visibility of SLOs and fostering a culture of reliability.

monitoringoperationsSREReliabilitySLOVALET
Continuous Delivery 2.0
Written by

Continuous Delivery 2.0

Tech and case studies on organizational management, team management, and engineering efficiency

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.