ByteHouse GIS: High‑Performance Geospatial Analytics and Benchmark Comparison with ClickHouse, StarRocks, PostGIS, and DuckDB
The article explains ByteHouse's GIS capabilities, describing its R‑Tree and Google S2 spatial index implementation, OGC‑compatible data types and functions, and presents benchmark results that show ByteHouse outperforming ClickHouse, StarRocks, PostGIS, and DuckDB on key geospatial queries.
In the digital era, geospatial analytics has become essential for businesses to gain market insights, optimize advertising, and improve logistics through querying, analyzing, and visualizing spatial data.
Case study: a coffee chain wants to open new stores in a city and uses population density, traffic flow, and competitor location data to select profitable sites.
Traditional GIS databases provide rich spatial object structures and indexes, but the rise of OLAP‑driven real‑time analytics has pushed GIS features into major OLAP products.
ByteHouse, an OLAP engine from Volcano Engine, recently added high‑performance GIS capabilities for location insight and crowd selection scenarios.
Application Scenarios and Value
Location insight: Show competitor traffic and performance within a radius around a point to support pricing and market positioning.
Operational map: Visualize supply and foot traffic inside a polygon to aid instant‑retail delivery optimization.
These use cases require GIS filtering combined with classic OLAP aggregation, suggesting a GIS+OLAP processing chain that benefits from optimizer adaptations.
Detailed Implementation
ByteHouse GIS introduces a two‑dimensional R‑Tree index on columnar data and optimizes spatial functions at the CPU level, supporting over 50 OGC‑standard functions and exploring GPU acceleration.
Two‑Dimensional Spatial Index
Traditional latitude‑longitude ordering leads to poor data locality for polygon or circle queries, causing read amplification.
ByteHouse converts lat‑lon points to a one‑dimensional order using Google S2, then partitions the sorted data into small blocks and builds an R‑Tree over these blocks, achieving lower storage overhead and faster queries.
ByteHouse’s index (Google S2 + R‑Tree) clusters nearby points in the same block, improving cache locality.
OGC Standard Compatibility
Data Types
ByteHouse supports the seven OGC geometry types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection.
Unlike traditional GIS databases that store geometries as BLOBs, ByteHouse stores them as numeric arrays in columnar format, reducing serialization overhead.
Spatial Functions
More than 50 spatial functions are available; high‑frequency functions have been optimized for columnar storage.
Data Migration
Supported import/export formats include WKT, WKB, GeoJSON, Shapefile, Parquet, CSV, and Arrow.
Benchmark Tests
NYC Taxi Dataset
Using the NYC Taxi dataset (21 GB, 169 M rows) and two key functions ST_DistanceSphere and ST_Within , ByteHouse was compared against ClickHouse, StarRocks, PostGIS, and DuckDB across three zones of varying size.
ST_Within performance : ByteHouse achieved the lowest latency (<1 s) thanks to its 2‑D index and vectorized processing; DuckDB suffered from lack of indexes and BLOB storage; PostGIS showed >6 s latency on large zones.
ST_DistanceSphere performance : ByteHouse kept query time under 0.1 s; ClickHouse and StarRocks also performed well (0.1‑1 s).
Key observations:
ByteHouse combines OLAP and GIS, delivering superior compute performance.
Columnar storage of geometries reduces space and enables vectorized execution.
Hardware parallelism further accelerates spatial functions.
Compared with community ClickHouse, ByteHouse adds OGC compatibility and spatial indexes that mitigate read amplification.
Business Dataset
In e‑commerce scenarios, ByteHouse GIS reduces data reads by over 50 %, lowering disk I/O and CPU usage while supporting fast operational analytics.
Conclusion
The article dissects ByteHouse GIS’s technical design and benchmarks it against four competing products.
ByteHouse shows lower query latency and higher throughput for ST_DistanceSphere and ST_Within functions.
When choosing a solution, consider data scale, scalability, usability, stability, security, and integration requirements; for pure geospatial workloads PostGIS is strong, but for mixed OLAP‑GIS workloads ByteHouse, ClickHouse, StarRocks, and DuckDB are viable, with ByteHouse offering the best performance and cloud‑native features.
References
PostGIS: https://postgis.net/
OGC SFS: https://www.ogc.org/standard/sfs/
Google S2: https://s2geometry.io/
GEOS: https://libgeos.org/
ClickHouse Geo Functions: https://clickhouse.com/docs/en/sql-reference/functions/geo/coordinates
CUDA Toolkit: https://developer.nvidia.com/cuda-toolkit
cuSpatial: https://github.com/rapidsai/cuspatial
Arctern: https://github.com/arctern-io/arctern
Go Spatial Search: https://halfrost.com/go_spatial_search/
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.