Databases 29 min read

How JD’s JUST Engine Tackles Massive Spatio‑Temporal Data at Scale

The presentation details the design and implementation of JD’s Urban Spatio‑Temporal Data Engine (JUST), explaining its architecture, novel storage and indexing techniques, performance optimizations, experimental results, and real‑world applications such as pandemic contact tracing and hazardous‑material monitoring, while highlighting its academic impact.

ITPUB
ITPUB
ITPUB
How JD’s JUST Engine Tackles Massive Spatio‑Temporal Data at Scale

Background and Motivation

Rapid growth of 5G and IoT generates massive spatio‑temporal data (map vectors, satellite imagery, urban sensing such as vehicle GPS, mobile‑base‑station signaling, and social‑media check‑ins). Four challenges arise:

Very large data volume (hundreds of GB to TB).

Complex, heterogeneous data structures (point, line, polygon, time‑series, network).

Special query patterns: spatial range, nearest‑neighbor (KNN), and time‑bounded queries.

High update frequency (e.g., GPS every 2 seconds).

System Overview (JUST)

JUST (JD Urban Spatio‑Temporal Data Engine) integrates a distributed key‑value store (HBase), an in‑memory processing engine (Spark), and a spatio‑temporal index component (GeoMesa) to provide a scalable, high‑performance platform for storage, indexing, analytics, visualization, and service exposure.

JUST architecture diagram
JUST architecture diagram

Core Modules

JUST‑DB : storage, spatio‑temporal modeling, indexing.

JUST‑DM : built‑in mining algorithms for trajectories, road networks, and generic data.

JUST‑TS : time‑series analysis and visualization.

JUST‑GIS : GIS‑oriented visualization with real‑time rendering.

Task‑Management : real‑time and scheduled job orchestration.

JUST‑Service : configurable API layer (REST/JDBC) for external access.

Monitoring : health checks and stability components.

Data Modeling and Storage Optimizations

Traditional vertical trajectory storage stores each GPS point as a separate record, leading to excessive row count and poor compression. JUST adopts a horizontal scheme:

Trajectories are segmented into sub‑trajectories.

All points of a sub‑trajectory are stored together in a single HBase row.

This reduces the number of rows, enables column‑family compression, and preserves complete trajectory context.

A “trajectory signature” further refines location representation: the minimum bounding rectangle (MBR) is divided into an n×n grid, and occupied cells are encoded as a binary vector, providing a finer‑grained spatial filter.

Horizontal storage layout
Horizontal storage layout

Spatio‑Temporal Index Design

GeoMesa’s default indexing uses a single GeoHash (space‑filling curve) that interleaves latitude and longitude bits, then appends a time component. This creates a large key range when time and space scales differ, degrading filter selectivity.

JUST improves the index by:

Partitioning the time dimension into fixed‑size buckets (e.g., 1 day).

Within each time bucket, encoding latitude and longitude independently using GeoHash.

Generating separate spatial range keys per bucket, dramatically shrinking the key space.

Result: a 30‑second range query is reduced to under 5 seconds.

Time‑bucketed index
Time‑bucketed index

Spark Context Pre‑Allocation

Typical Spark jobs request resources from YARN for each query, incurring latency. JUST creates two SparkContexts in advance, managed by SparkJobServer. When a query arrives, an existing context is selected, eliminating the YARN allocation step. The dual‑context design also provides high availability: if one context fails, the other continues serving requests.

Dual SparkContext architecture
Dual SparkContext architecture

Performance Evaluation

Experiments on a real trajectory dataset (original size 136 GB) show:

Storage reduced to 30 GB (≈85 % space saving).

Index construction speed improved >7×.

Spatial‑range and KNN queries outperform two leading Spark‑based trajectory systems; JUST remains stable beyond 100 GB, whereas competitors crash.

Storage and index performance
Storage and index performance

Unified SQL Interface and API

All data operations—definition, ingestion, query, and analysis—are expressed via standard SQL. JUST implements a custom optimizer that pushes predicates, performs projection pruning, and rewrites queries for efficient execution. A JDBC driver conforms to the standard, allowing existing tools (e.g., MySQL clients) to connect without code changes.

Service Layer (JUST‑Service)

JUST‑Service provides a configuration‑driven API gateway. Developers specify routing, authentication, rate‑limiting, caching, and logging parameters in a declarative file; the system automatically exposes RESTful endpoints without writing additional code.

API configuration UI
API configuration UI

GIS Visualization (JUST‑GIS)

JUST‑GIS couples the high‑throughput JUST‑DB with distributed rendering techniques. It supports:

Real‑time map‑matching of vehicle trajectories.

Dynamic aggregation and clustering to keep client rendering smooth (e.g., handling >5000 points).

Custom styling via JUST‑Studio, a visual editor for map layers.

JUST‑GIS rendering example
JUST‑GIS rendering example

Real‑World Deployments

Key applications demonstrate JUST’s impact:

Epidemic Contact Tracing : During COVID‑19, the system identified >500 high‑risk contacts in Beijing and supported rapid tracing across multiple provinces.

Hazardous‑Material Monitoring : Detects route deviations of dangerous‑goods vehicles, flags illegal storage sites, and reduces manual inspection workload.

Urban Logistics : Reconstructs fine‑grained residential road networks from courier GPS traces, enabling optimized routing for delivery services.

Academic Contributions

JUST’s design and evaluation have been accepted at top venues (ICDE 2020, AAAI, KDD, etc.) and earned two consecutive ACM SIGSPATIAL decade‑impact awards. The platform underpins projects such as Xiong’an New Area data platform, Jiangsu Expo smart park, Nanjing Snow‑Bright, and national agricultural parks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

databaseSmart CityGISspatio-temporal
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.