How to Optimize Real‑Time Vector Tile Services for Millions of Features with PostgreSQL & PostGIS
This article explains how to efficiently browse and render millions of GIS features in real‑time vector tiles using PostgreSQL and PostGIS, covering background challenges, several thinning algorithms, their implementation steps, limitations, advantages, and a practical example with a 3‑million‑point dataset.
Background
When GIS applications need to display vector tiles at low zoom levels (1‑6) with millions of features that change daily, a plain PostgreSQL/PostGIS tile service becomes too slow. Thinning (point dilution) is required to keep query response time low while preserving visual quality.
Thinning Strategies
Different business requirements lead to different thinning approaches. The following methods can be applied directly in SQL without pre‑processing.
1. Business‑driven filtering
Apply domain‑specific WHERE clauses to exclude irrelevant points before generating tiles.
2. Algorithmic thinning
2.1 Grid‑based thinning
Define a grid covering the data extent (cell size chosen according to desired density).
For each cell, use a gist spatial index to find points inside the cell:
Select one point (or a random subset) from the cell and discard the rest.
Repeat for all cells.
2.2 Distance‑based thinning
Set a distance threshold d.
Pick an initial point p0. Remove all points whose distance to p0 is ≤ d:
Choose the next remaining point as the reference and repeat until no points are left.
2.3 Random‑value thinning
Store a pseudo‑random value per row, e.g. random_val double precision DEFAULT random(), or compute it on the fly.
Define a retention ratio r (e.g., 0.5 for 50 %).
Filter with WHERE random() > (1 - r) in the tile query. The ratio can be varied per zoom level.
2.4 Combined grid + random thinning
Create the same grid as in 2.1.
Within each cell, apply the random filter from 2.3 to keep only a fraction of points.
2.5 Distance with density preservation
Define a distance threshold d and a discard proportion p (0 < p < 1).
Cluster points that lie within d of a reference point.
From each cluster, randomly discard p × cluster_size points.
Proceed to the next unprocessed point and repeat.
Advantages
All methods can be executed in real time; no offline preprocessing is required.
The random() function can be indexed with a B‑tree if stored in a column, improving performance.
By storing the random value, different zoom levels can reuse the same base table while retaining appropriate subsets.
Limitations
For very small datasets the visual impact may be negligible.
Implementation Example
A reference implementation that demonstrates the above techniques is available at the following repository:
https://github.com/MrSmallLiu/point_dilution
Demo Configuration
Dataset : 3 million points.
Thinning rule : Random‑value thinning with zoom‑dependent thresholds (e.g., zoom < 4 → retain 20 %, zoom 4‑6 → 40 %, zoom 7‑9 → 70 %, zoom ≥ 10 → no thinning).
Tile service : Real‑time vector tiles that apply the appropriate WHERE clause per zoom level.
Sample screenshots illustrating before/after thinning are included in the repository.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
