What Starbucks Can Teach Us About Asynchronous Messaging and Two‑Phase Commit
The article uses Starbucks' coffee‑ordering process as a concrete analogy to explain asynchronous messaging, correlation, error‑handling strategies such as write‑off, retry and compensation, and why two‑phase commit is unsuitable for high‑throughput services.
Translator's Note
This article is a translation of the 2004 piece "Starbucks Does Not Use Two‑Phase Commit".
1. Give Me a Hot Chocolate
During a two‑week trip to Japan the author observed the massive number of Starbucks stores in Shinjuku and Roppongi and wondered how Starbucks handles orders.
Like most businesses, Starbucks aims to maximize orders, so it processes them asynchronously.
The cashier marks the cup with the order and places the cup in a queue (the row of cups on the coffee machine).
The queue decouples the cashier from the barista, allowing continuous order taking even when baristas are busy.
This is an example of a Competing Consumer scenario: more baristas can be added if needed.
2. Correlation
Asynchronous processing introduces a correlation problem: the order in which coffees are completed may differ from the order they were placed.
Reasons include multiple baristas using different machines and varying preparation times, or batching similar drinks to improve efficiency.
Starbucks solves this by using a correlation ID, typically the customer's name written on the cup, or in some countries the drink type.
3. Exception Handling
Handling failures in an asynchronous system is difficult, and Starbucks provides real‑world examples.
If payment fails and the coffee is already made, it is discarded; if not yet made, the cup is removed from the queue.
If the coffee is wrong or unsatisfactory, it is remade.
If the coffee machine breaks, the customer is refunded.
3.1 Write‑off
The simplest strategy: do nothing or discard the work. When the financial loss is negligible, this can be acceptable, as illustrated by ISP billing errors that are later reconciled.
3.2 Retry
When some operations in a transaction fail, you can either roll back completed work or retry the failed operations. Retry is viable when the failure is transient (e.g., an external system outage) but not when it violates business rules.
Idempotent receiver retry allows all operations to be retried safely because the receiver ignores duplicate messages after successful processing.
3.3 Compensation
The most thorough approach: roll back all completed work to bring the system back to a consistent state, such as issuing a refund in a financial transaction.
4. Two‑Phase Commit
Unlike the strategies above, two‑phase commit consists of a prepare phase and an execute phase.
If applied to Starbucks, the customer would have to wait with cash and receipt on the counter until the coffee is ready, drastically reducing throughput.
Thus, two‑phase commit simplifies error handling but hampers scalability because it forces multiple asynchronous steps into a single stateful transaction.
5. Conversation Pattern
The coffee‑shop interaction exemplifies a simple Conversation pattern: a short synchronous exchange (order and payment) followed by a longer asynchronous exchange (preparation and delivery).
Similar patterns appear in e‑commerce, where an order number is assigned synchronously and subsequent steps (payment, packaging, shipping) happen asynchronously, with compensation (refund) or retry (re‑ship) as needed.
Observing everyday asynchronous processes helps design robust messaging systems.
Thank you for reading!
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.