Evolution and Optimization of Numerical Indexing in Elasticsearch for Geo‑Location Queries
The article traces Elasticsearch’s geo‑indexing evolution from early string‑based simulations through Quadtree filters to the modern BKD‑tree implementation, showing how each optimization dramatically improves memory usage, query speed, and accuracy for large‑scale point‑of‑interest searches in location‑based services.
This article reviews the upgrades and optimization ideas behind Elasticsearch's numeric indexing implementations from 2015 to the present. It traces the evolution from early string‑based simulations to KD‑Tree based approaches, highlighting how the capabilities have become increasingly sophisticated and applicable to a wide range of scenarios such as geo‑location modeling, multi‑dimensional coordinates, data retrieval, and analytical insights.
Business background
Location‑based services (LBS) are a core component of many internet applications (food delivery, ride‑hailing, retail, etc.). A fundamental capability is the ability to search for points of interest (POI) near a given coordinate, e.g., finding all POIs within a 1 km radius of a center point. Elasticsearch provides millisecond‑level response times for such geo queries, which is critical for user experience. The basic query uses the geo_distance filter:
GET /my_locations/_search
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"geo_distance": {
"distance": "1km",
"pin.location": { "lat": 40, "lon": 116 }
}
}
}
}
}The article then explores the underlying ideas beyond the tool itself, focusing on how Elasticsearch solves the problem of fast, accurate POI search at massive scale.
Background knowledge
1. Precise positioning uses latitude, longitude, and optional altitude. Latitude ranges [-90, 90] and longitude [-180, 180]. For most business cases, a 2‑D coordinate is sufficient.
2. Distance calculation between two points on a sphere is commonly performed with the Haversine formula. The following Java code from Lucene’s SloppyMath utility implements the formula:
// 代码摘自 lucene‑core‑8.2.0, SloppyMath 工具类
/** Returns the Haversine distance in meters between two points */
public static double haversinMeters(double sortKey) {
return TO_METERS * 2 * asin(Math.min(1, Math.sqrt(sortKey * 0.5)));
}
/** Returns a sort key for distance (less expensive to compute) */
public static double haversinSortKey(double lat1, double lon1, double lat2, double lon2) {
double x1 = lat1 * TO_RADIANS;
double x2 = lat2 * TO_RADIANS;
double h1 = 1 - cos(x1 - x2);
double h2 = 1 - cos((lon1 - lon2) * TO_RADIANS);
double h = h1 + cos(x1) * cos(x2) * h2;
return Double.longBitsToDouble(Double.doubleToRawLongBits(h) & 0xFFFFFFFFFFFFFFF8L);
}
public static final double TO_RADIANS = Math.PI / 180D;
public static final double TO_DEGREES = 180D / Math.PI;
private static final double TO_METERS = 6_371_008.7714D; // Earth's mean radius in meters3. Geohash, introduced in 2008, encodes latitude/longitude into a short string that can be used as a URL‑friendly identifier. Its prefix property enables fast proximity queries because nearby points share a common prefix.
Solution evolution
Elasticsearch’s geo indexing has progressed through three major stages:
Pre‑historic era – Early Lucene versions only supported full‑text search; numeric queries were simulated with string terms.
Elasticsearch 2.0 – Introduced geo_distance based on NumericRangeQuery . The geo_point field stored latitude, longitude, and a geohash, and the query performed a rectangular bounding‑box filter followed by a precise Haversine check.
Elasticsearch 2.2 – Replaced the naive rectangle filter with a Quadtree‑based approach. The world is recursively subdivided into four quadrants; the query first selects cells that intersect the search rectangle, then applies the Haversine filter only to the candidate points.
Elasticsearch 5.0+ – Adopted the BKD‑tree (a multidimensional variant of B‑tree) for numeric and geo indexing, dramatically improving memory usage and query speed. The query now builds a rectangular shape around the center point, intersects it with BKD‑tree cells, and classifies cells as CELL_INSIDE_QUERY , CELL_CROSSES_QUERY , or CELL_OUTSIDE_QUERY to decide whether to accept all points, examine individual points, or skip the cell.
Key code snippets illustrating the evolution:
public static final class GeoPointFieldType extends MappedFieldType {
private MappedFieldType geohashFieldType;
private int geohashPrecision;
private boolean geohashPrefixEnabled;
private MappedFieldType latFieldType;
private MappedFieldType lonFieldType;
public GeoPointFieldType() {}
} public static DistanceBoundingCheck distanceBoundingCheck(double sourceLatitude, double sourceLongitude, double distance, DistanceUnit unit) {
double radDist = unit.toMeters(distance) / GeoUtils.EARTH_SEMI_MINOR_AXIS;
double radLat = Math.toRadians(sourceLatitude);
double radLon = Math.toRadians(sourceLongitude);
double minLat = radLat - radDist;
double maxLat = radLat + radDist;
double minLon, maxLon;
if (minLat > MIN_LAT && maxLat < MAX_LAT) {
double deltaLon = Math.asin(Math.sin(radDist) / Math.cos(radLat));
minLon = radLon - deltaLon;
if (minLon < MIN_LON) minLon += 2d * Math.PI;
maxLon = radLon + deltaLon;
if (maxLon > MAX_LON) maxLon -= 2d * Math.PI;
} else {
minLat = Math.max(minLat, MIN_LAT);
maxLat = Math.min(maxLat, MAX_LAT);
minLon = MIN_LON;
maxLon = MAX_LON;
}
GeoPoint topLeft = new GeoPoint(Math.toDegrees(maxLat), Math.toDegrees(minLon));
GeoPoint bottomRight = new GeoPoint(Math.toDegrees(minLat), Math.toDegrees(maxLon));
return (minLon > maxLon) ? new Meridian180DistanceBoundingCheck(topLeft, bottomRight)
: new SimpleDistanceBoundingCheck(topLeft, bottomRight);
} public class IndexedGeoBoundingBoxQuery {
public static Query create(GeoPoint topLeft, GeoPoint bottomRight, GeoPointFieldMapper.GeoPointFieldType fieldType) {
if (!fieldType.isLatLonEnabled()) {
throw new IllegalArgumentException("lat/lon is not enabled (indexed) for field [" + fieldType.names().fullName() + "], can't use indexed filter on it");
}
if (topLeft.lon() > bottomRight.lon()) {
return westGeoBoundingBoxFilter(topLeft, bottomRight, fieldType);
} else {
return eastGeoBoundingBoxFilter(topLeft, bottomRight, fieldType);
}
}
// ... implementations of westGeoBoundingBoxFilter and eastGeoBoundingBoxFilter ...
} if (parseContext.indexVersionCreated().before(Version.V_2_2_0)) {
query = new GeoDistanceRangeQuery(point, null, distance, true, false, geoDistance, geoFieldType, indexFieldData, optimizeBbox);
} else {
distance = GeoUtils.maxRadialDistance(point, distance);
query = new GeoPointDistanceQuery(indexFieldData.getFieldNames().indexName(), point.lon(), point.lat(), distance);
}Overall, the article demonstrates how Elasticsearch’s geo‑spatial capabilities have matured from simple string tricks to sophisticated multidimensional indexing structures, enabling millisecond‑level POI searches at massive scale. The evolution mirrors broader trends in big‑data search engines: moving from brute‑force filters to hierarchical spatial indexes (Quadtree, BKD‑tree) that dramatically reduce I/O and CPU costs.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.