Big Data 23 min read

Point Aggregation Algorithms for Map-based POI Data: Comparison, Implementation, and Evaluation

The article surveys and compares several point‑aggregation techniques for map‑based POI visualisation—including k‑means, grid‑based, grid‑centroid‑merge, grid‑distance, quad‑tree and KD‑tree methods—detailing their implementations, performance and clustering quality, evaluating them on a 175‑point dataset, and recommending the most suitable algorithm according to data size and required accuracy.

Amap Tech
Amap Tech
Amap Tech
Point Aggregation Algorithms for Map-based POI Data: Comparison, Implementation, and Evaluation

1. Introduction

When visualizing map‑based services, displaying a large number of Points of Interest (POI) at small scales can overwhelm rendering performance and visual clarity. A common solution is to show detailed POI data at large scales and aggregated data at small scales. This article discusses how to achieve reasonable aggregation results for POI datasets of varying size and distribution, supporting both offline pre‑processing and real‑time aggregation.

2. Comparison of Point‑Aggregation Algorithms

The following algorithms are compared in terms of performance and aggregation quality. Users can select the most suitable algorithm based on their scenario.

2.1 k‑means Clustering

Performance: computationally intensive due to many distance calculations.

Effect: cannot resolve overlapping markers because the algorithm does not consider visual overlap.

2.2 Grid‑Based Clustering

The map is divided into a grid; points falling into the same cell are merged into a single centroid representing the cell. The centroid value is the number of points in the cell.

Advantages: fast computation, no pairwise distance calculations.

Disadvantages: adjacent points may fall into different cells, causing artificial separation; cell centroids may not reflect true cluster centers.

2.3 Grid‑Centroid Merge Algorithm

After computing cell centroids, nearby centroids are merged within a configurable radius (circle or square). This reduces over‑segmentation caused by strict grid boundaries.

Advantages: relatively fast; improves visual continuity.

Disadvantages: adds computation and still suffers from grid‑induced separation.

2.4 Grid‑Distance‑Based Clustering

Each point is surrounded by a square of user‑defined size. If the square does not intersect any existing aggregation square, a new aggregation point is created; otherwise the point is merged into the nearest existing aggregation point.

Advantages: faster than pure distance‑based methods; each point is processed once.

Disadvantages: results depend on iteration order; aggregation centers are the first points added, which may not be optimal.

2.5 Quad‑Tree + Grid Aggregation

The quad‑tree provides fast spatial queries for points within a grid cell. The tree recursively subdivides space into four quadrants, storing points in leaf nodes. Insertion, splitting, and search operations are described in detail.

typedef struct TBQuadTreeNodeData { // POI data stored in a quad‑tree node
   double x; // longitude
   double y; // latitude
   void* data; // additional info
} TBQuadTreeNodeData;

TBQuadTreeNodeData TBQuadTreeNodeDataMake(double x, double y, void* data);

typedef struct TBBoundingBox { // bounding box of a node
   double x0, y0; // min coordinates
   double xf, yf; // max coordinates
} TBBoundingBox;

TBBoundingBox TBBoundingBoxMake(double x0, double y0, double xf, double yf);

typedef struct quadTreeNode {
   struct quadTreeNode* northWest;
   struct quadTreeNode* northEast;
   struct quadTreeNode* southWest;
   struct quadTreeNode* southEast;
   TBBoundingBox boundingBox;
   int bucketCapacity;
   TBQuadTreeNodeData *points;
   int count;
} TBQuadTreeNode;

TBQuadTreeNode* TBQuadTreeNodeMake(TBBoundingBox boundary, int bucketCapacity);

After building the quad‑tree, the aggregation process divides the screen into grids, queries POI within each grid via the quad‑tree, computes the average centroid, and records the point count.

2.6 K‑D Tree

A K‑D tree extends the concept of segment trees to multi‑dimensional data, enabling efficient range and nearest‑neighbor searches. The article provides a Python‑style pseudo‑code for node definition and tree construction.

class Node:
   def __init__(self, value, lchild, rchild):
      self.value = value
      self.lchild = lchild
      self.rchild = rchild

def build(arr):
   n = len(arr)
   left, right = arr[: n//2], arr[n//2:]
   lchild, rchild = self.build(left), self.build(right)
   return Node(max(lchild.value, rchild.value), lchild, rchild)

3. Aggregation Algorithm Workflow

Input: data (POI list with coordinates) and config (aggregation parameters such as number of levels, algorithm‑specific thresholds).

Output: a map Map<String, List<AggregationDTO>> where each key represents a level identifier and each AggregationDTO contains the aggregated point’s coordinates, original POI count, and auxiliary information.

Processing steps: data preprocessing → grid division → spatial query (quad‑tree/K‑D tree) → centroid/aggregation calculation → optional centroid merging → result assembly.

4. Results & Evaluation

The test dataset consists of 175 GPS points. Visualizations show raw point distribution and aggregated results for each algorithm.

Evaluation metrics focus on cluster compactness (intra‑cluster distance) and separation (inter‑cluster distance). The article references the S_Dbw index as a robust internal validation measure.

Key observations:

Grid‑centroid algorithm can split naturally clustered points due to fixed cell boundaries.

Grid‑centroid‑merge mitigates this by merging nearby centroids.

Grid‑distance algorithm avoids “grid‑cut” issues by assigning each raw point to the nearest existing aggregation point, leading to more accurate and stable clusters.

Performance comparison suggests that for POI counts above 10 K, algorithm choice significantly impacts speed, while for aggregated point counts above 1 K, memory consumption becomes a concern.

5. Recommendations

Choose the algorithm based on data volume and required accuracy:

Small datasets (<10 K POI): any algorithm works; prefer simpler grid‑centroid for speed.

Medium to large datasets (≥10 K POI): grid‑distance or quad‑tree‑based methods provide better balance of performance and visual quality.

When high precision is essential, consider KD‑tree or quad‑tree with centroid merging.

6. References

[1] 丁立国, 熊伟, 周斌. 专题图空间点聚合可视化算法研究. 地理空间信息, 2017, 15(5): 6‑9.

[2] Theodore Calmes. How To Efficiently Display Large Amounts of Data on iOS Maps. 2015.

[3] Liu Y, Li Z, Xiong H, et al. Understanding of internal clustering validation measures. IEEE Int. Conf. on Data Mining, 2010: 911‑916.

[4] 承志. 机器学习--详解KD-Tree原理. 2020.

[5] Viky. 在线地图的点聚合算法及现状. 2014.

Big Dataperformance evaluationMap visualizationPOI clusteringpoint aggregationspatial algorithms
Amap Tech
Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.