How Uber Uses H3 Hexagonal Indexing to Power Real‑Time Driver Matching
This article explains how Uber solves the "nearby driver" problem by employing the open‑source H3 hexagonal spatial index, hierarchical grids, Cassandra for persistent storage, and Redis caching to deliver fast, accurate, and scalable real‑time location services.
Problem Statement
Finding drivers near a passenger requires a spatial index that can answer “which drivers are within a given radius” without scanning millions of latitude‑longitude points.
H3 Hexagonal Spatial Index
Uber uses the open‑source H3 library, which partitions the Earth's surface into a hierarchical hexagonal grid. Each hexagon (cell) is identified by a unique 64‑bit integer.
Hierarchical Index
The grid starts with 122 base cells (12 are pentagons to accommodate the sphere) and supports 16 resolution levels. The finest resolution covers roughly one square metre. Larger cells can be subdivided into seven smaller cells, allowing fine‑grained indexing where needed while keeping storage low in sparse regions.
Hexagonal Grid Advantages
All immediate neighbours of a hexagon are at the same centre‑to‑centre distance, simplifying distance calculations compared with triangular or square grids where neighbour distances vary.
Neighbour Search API
The library provides kRing(cell, k), which returns all cells within k rings around a target cell. This enables fast expansion of the search area without computing distances for every driver.
Location Storage in Cassandra
Uber stores only the H3 cell identifier and a timestamp in Apache Cassandra, a highly available distributed NoSQL database optimized for writes.
CREATE TABLE location_info (
user_id text,
user_type int,
h3_cell_id text,
timestamp bigint,
PRIMARY KEY (h3_cell_id, user_id)
);
-- Sample row
001 | 1 | 891fbf5b547f7e0 | 1723346519Query drivers that share the same cell as a passenger:
SELECT user_id FROM location_info
WHERE h3_cell_id = '891fbf5b547f7e0'
AND user_type = 1;To search a broader area, generate surrounding cells with kRing and use an IN clause:
# pseudo‑code
k = 2
cells = h3.kRing('891fbf5b547f7e0', k)
SELECT user_id FROM location_info
WHERE h3_cell_id IN (cells)
AND user_type = 1;Map‑Matching Pre‑processing
Raw GPS reports are noisy. Uber applies a three‑step pipeline before indexing:
Collect raw GPS points from the driver app.
Filter spikes and smooth the trajectory (e.g., moving‑average or Kalman filter).
Snap the cleaned points to the nearest road using a map‑matching algorithm such as linear interpolation or a Kalman‑filter‑based matcher.
Caching Layer with Redis
A Redis cache sits in front of Cassandra to reduce latency.
Latest user location : key = user:{user_id}, value = JSON with latitude, longitude, user_type, timestamp.
Driver IDs per H3 cell : key = h3:{cell_id}, value = list of driver IDs currently in that cell.
{
"latitude": 40.7128,
"longitude": -74.0060,
"user_type": 1,
"timestamp": "2024-08-11T14:32:00Z"
}Ride‑request workflow:
Convert passenger coordinates to an H3 cell (and optionally surrounding cells via kRing).
Lookup driver ID sets in Redis; on cache miss, fall back to the Cassandra query shown above.
Fetch each driver’s latest location from Redis, compute distance or ETA, sort, and return the top candidates.
Summary
By combining H3 for hierarchical hexagonal indexing, Cassandra for durable bulk storage, and Redis for low‑latency caching, Uber built a scalable real‑time location service capable of handling millions of queries per second while delivering accurate driver‑to‑passenger matching.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
