Big Data 14 min read

How GeoHash Powers Billion‑Scale Point‑in‑Polygon Matching at Alibaba Xianyu

This article explains how Alibaba Xianyu uses GeoHash encoding and optimized spatial indexing to efficiently match billions of user‑posted GPS points with tens of thousands of market‑area polygons, reducing computation from quadrillions to billions of operations through precise point‑polygon algorithms and fast neighbor lookups.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How GeoHash Powers Billion‑Scale Point‑in‑Polygon Matching at Alibaba Xianyu

Summary

Alibaba Xianyu divides cities into market‑area polygons based on traffic, mall distribution, and residential patterns, then matches user‑posted GPS points (items) to these polygons. With over 1 billion items and ~10 000 polygons, a naïve point‑in‑polygon calculation would require ~2×10^23 basic operations and would not finish even on a large offline cluster.

By improving the algorithm, Xianyu adopts a hybrid GeoHash approach: precise GeoHash matching combined with coarse GeoHash filtering and limited fine‑grained geometry checks, reducing the computation to a one‑day offline job.

Point Data GeoHash Principle and Algorithm

GeoHash encodes a latitude‑longitude pair into a string by repeatedly bisecting the latitude and longitude intervals. Each binary digit is assigned to either latitude or longitude (even positions for longitude, odd for latitude). The longer the string, the finer the spatial resolution.

Example: encoding coordinate 30.280245, 120.027162 yields the GeoHash string wtmk72 after 15 bits of binary encoding and Base‑32 conversion.

Polygon Data GeoHash Encoding

To encode a polygon, first compute its minimum bounding rectangle (MBR) and GeoHash the southwest corner. Using the inverse GeoHash algorithm, retrieve the rectangle represented by that GeoHash. Then iteratively expand eastward and northward to adjacent GeoHash cells of the same size until the cells fully cover the MBR. Cells that do not intersect the polygon are discarded, reducing unnecessary calculations.

Fast Neighbor GeoHash Algorithm

Instead of decoding and re‑encoding to find adjacent cells, a lookup table based on the Z‑order curve is used. Odd‑position characters (e.g., w, m, 7) map to latitude‑longitude bits, even‑position characters (e.g., t, k, 2) map oppositely. By consulting the table, the eight neighboring GeoHash strings of any cell can be derived directly, handling edge‑wrap‑around by cascading to higher‑order characters.

Efficient Large‑Scale Point‑Polygon Relationship

Both items and market‑area polygons are assigned GeoHash codes of equal length. Each point has a single GeoHash; each polygon has one or more GeoHashes that are either fully contained or partially intersecting.

Join item GeoHashes with polygon “fully‑contained” GeoHashes to obtain definitive matches.

Join remaining items with polygon “partially‑contained” GeoHashes; for these candidates, perform precise point‑in‑polygon geometry checks (e.g., ray‑casting) to confirm membership.

This two‑stage process reduces the Cartesian product from 1 billion × 10 000 to roughly 1 billion × few, cutting the total basic operations to about 1.8 × 10^12, which completes in under a day on Alibaba’s offline platform.

Extension

The GeoHash‑based spatial index exemplifies broader indexing techniques such as R‑trees, quad‑trees, K‑D trees, and grid indexes, which can also accelerate point‑point, polygon‑polygon, and line‑based spatial queries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AlibabaAlgorithm OptimizationGeoHashpoint-in-polygonSpatial Indexing
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.