How Uber Scaled Its Payment Platform with a Job‑Based Double‑Entry Ledger
Uber engineers rebuilt their payment system, Gulfstream, into a SOX‑compliant, job‑based double‑entry ledger that migrates billions of transactions across asynchronous services while maintaining zero‑downtime, high availability, idempotency, and strong data consistency.
Motivation and Introduction
Uber Money needed a robust, highly available payment engine that could handle over 18 million daily requests with zero tolerance for downtime, ensuring timely, accurate, and compliant fund transfers.
New System and Architectural Advantages
The fifth‑generation platform, Gulfstream, is a single, integrated, SOX‑compliant system built on double‑entry bookkeeping principles and self‑regulating mechanisms. It replaces legacy systems that lacked a holistic view of end‑to‑end cash flow and slowed feature development.
Job/Order‑Based System
Instead of a transaction‑centric model, Gulfstream uses a job/order approach where each job (e.g., a ride or food delivery) can generate multiple orders (e.g., adjustments, tips). Each order contains entries representing money moving between user accounts, and the sum of all entries is zero, mirroring real‑world double‑entry accounting.
Order creation and processing are decoupled via a message‑queue system.
Order insertion service creates a payment order, publishes it to a message topic, and persists it in OrderStore.
Order processing service consumes payment orders, may generate subsequent orders, routes requests to payment providers, and produces a result order indicating success or failure.
Figure: Order processing pipeline.
High Availability Across Regions
Services exchange order messages via a lossless, multi‑region message‑queue cluster; standby instances in other regions take over if one region fails.
Payment accounts and balances are stored in a multi‑region arbitrated storage system.
Ensuring Idempotency
Unique identifiers are deterministically generated for users, jobs, and orders.
Processed order IDs guarantee each order is handled exactly once.
Funds flow is driven by order processing that atomically updates payment accounts.
Orders are immutable after persistence.
Data Consistency Between Asynchronous Platforms
During migration, each transaction change is recorded in an EntityChangeLog with version numbers, enabling serialized write‑backs and preventing conflicts even with concurrent adjustments.
Figure: Entity change log in order processing.
Migration and Write‑Back Strategy
Build dashboards to monitor business metrics.
Adopt deployment strategies that allow rapid issue detection without widespread impact.
Monitor inter‑system traffic to verify expected behavior.
Serialize write‑backs from the new system to the legacy system using EntityChangeLog versions to resolve race conditions.
Dashboard and Metrics
Before production, the team added observability metrics (latency, outcomes, per‑workflow tracking) and alerts for both live and shadow traffic, enabling engineers to monitor success rates and anomalies across services.
Smart Deployment Strategy
Deployments are staged in multiple steps:
Internal service rollout to sync systems.
RolloutData {
List<UUID> primaryPayerUUIDs,
List<UUID> primaryPayeeUUIDs
}External rollout gradually migrates functionality to the new system, using control and experiment groups, starting with a limited country and incrementally increasing traffic percentages.
Sequential Write‑Back
Each order update generates an EntityChangeLog entry with a version number; services enforce order by version, ensuring sequential write‑backs to the legacy system.
Verification and Retry
Pre‑deployment validation includes:
24‑hour asynchronous jobs per region/city.
End‑to‑end debugging logs per order.
Order‑state verification to ensure full payment processing.
Additional retry mechanisms and a re‑adjustment API handle out‑of‑order events, leveraging idempotency to recover quickly from service failures.
Lessons Learned
Designing an order‑based double‑entry accounting system.
Seamless migration between two asynchronous systems with high availability.
Platform redesign without impacting internal or external customers.
Version control is critical for cross‑system consistency.
End‑to‑end integration testing and continuous validation are essential.
Comprehensive monitoring and alerting reduce MTTR.
Exponential retries for transient payment failures ensure reliability.
Conclusion and Future Plans
The team successfully launched Gulfstream globally with near‑zero downtime, enabling rapid addition of new lines such as Uber Freight and NEMO. Future work aims to evolve the system into a true platform, abstracting away payer/payee and line‑of‑business specifics to further simplify Uber’s payment architecture.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
