Fundamentals 13 min read

How Xianyu Scales Billion‑Item Geo Matching with Fast GeoHash Algorithms

This article explains how Xianyu uses GeoHash‑based spatial indexing, precise and approximate matching, and a rapid neighbor‑search algorithm to efficiently associate billions of GPS‑tagged items with tens of thousands of city‑level business districts, reducing computation from quadrillions to billions of operations.

Alibaba Cloud Developer

Jul 12, 2018

How Xianyu Scales Billion‑Item Geo Matching with Fast GeoHash Algorithms

Abstract

Xianyu divides cities into business districts (AOIs) based on traffic, mall, and residential distributions, then matches user‑posted GPS items to the district they fall into. With over 1 billion items and ~10 000 districts, a naïve point‑in‑polygon calculation would require astronomically many operations.

GeoHash Principle for Point Data

GeoHash encodes latitude and longitude into a string by repeatedly halving the latitude and longitude intervals and recording 0 for left/bottom and 1 for right/top. The longer the string, the finer the grid; each GeoHash cell represents a rectangular area covering all points sharing that prefix.

Polygon GeoHash Encoding

To encode a polygon, compute its minimum bounding rectangle (MBR), encode the southwest corner, then decode that cell to obtain the base GeoHash block. By iteratively moving east and north, adjacent blocks of the same size are generated until they completely cover the MBR. Blocks that do not intersect the polygon are discarded, reducing subsequent calculations.

Fast Neighbor GeoHash Algorithm

Instead of decoding then re‑encoding a neighbor cell, the algorithm uses the regularity of GeoHash characters and the Z‑order curve. Odd‑position characters map to "lon‑lat‑lon‑…" bits, even‑position characters to "lat‑lon‑…" bits. Pre‑computed tables give the eight neighboring characters for each GeoHash symbol; when a neighbor crosses a table boundary, the algorithm propagates the carry to higher‑order characters, effectively performing a fast lookup without full decode‑encode cycles.

Efficient Large‑Scale Point‑Polygon Relationship Building

Each item receives a unique GeoHash; each district receives one or more GeoHash strings (fully or partially contained). A two‑step join is performed: first, items are joined with districts on fully‑contained GeoHashes to establish definite matches; second, items are joined on partially‑contained GeoHashes, followed by precise geometric checks only for these candidates, dramatically reducing the Cartesian product size.

In practice, Xianyu processed 1 billion items and 10 000 districts with roughly 1.8 × 10¹² basic operations—far less than the 2 × 10²⁰ operations required by a naïve approach—completing the computation in under a day on Alibaba’s offline cluster.

Conclusion

The GeoHash‑based spatial index, combined with the fast neighbor lookup, enables scalable point‑in‑polygon matching for massive datasets and can be extended to other spatial relationships such as point‑point, line‑line, and polygon‑polygon queries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

algorithm optimization GeoHash point-in-polygon Spatial Indexing GIS

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.