Big Data 13 min read

How JD Built a Millisecond‑Scale Real‑Time Browsing Record System for 500M Users

This article details JD's end‑to‑end design of a real‑time browsing record platform that captures, stores, and queries up to 200 recent items per user with millisecond latency, covering architecture, hot‑cold data separation, microservice APIs, and streaming pipelines using Kafka, Flink, Jimdb, and HBase.

dbaplus Community
dbaplus Community
dbaplus Community
How JD Built a Millisecond‑Scale Real‑Time Browsing Record System for 500M Users

System Overview

The browsing record system records each JD user’s real‑time product‑detail page visits, de‑duplicates them by product dimension, and stores up to the latest 200 entries per user, delivering millisecond‑level query latency. The architecture consists of four modules: data storage, data query, real‑time reporting, and offline reporting.

Data Storage Module Design and Implementation

To handle an estimated trillion‑scale records, JD separates hot and cold data. Recent (T‑4) records are kept in Jimdb’s in‑memory ordered sets, keyed by username with SKU as elements and timestamps as scores, and expire after four days. Older (T+4) records are flushed to HBase as K‑V JSON strings, with usernames MD5‑hashed and prefixed to avoid hotspotting; these entries expire after 62 days.

Query Service Module Design and Implementation

The query service exposes three micro‑service APIs: total count, record list, and delete operation. Rate limiting is achieved with Guava’s RateLimiter and a Caffeine local cache, applied per caller, per user, and globally. The total‑count flow first checks a cache; on miss it reads real‑time data from Jimdb, enriches with product info, deduplicates by SPU, and conditionally merges offline data from HBase when the hot data does not satisfy the maximum record limit. The list API follows the same steps, returning a merged, deduplicated list.

Real‑Time Reporting Module Design and Implementation

Front‑end services push user PV events to a Kafka topic with 50 partitions, providing load‑balancing and peak‑shaving. A Flink cluster consumes the topic, writes events to Jimdb via a Lua script that batches multiple commands (insert, count, delete, expire) into a single network round‑trip. Flink was chosen for its low latency, high throughput, fault‑tolerant distributed snapshots, and decoupling from the front‑end, which is essential during massive traffic spikes such as flash‑sale events.

Offline Reporting Module Design and Implementation

The offline pipeline runs daily: (1) the product‑detail front‑end reports PVs to a data‑mart table; (2) a nightly job extracts deleted‑record IDs from JD’s MySQL store; (3) a morning job deduplicates the last 60 days of PVs (capped at 200 per user), filters out deleted items, and writes the result to an offline partition table; (4) a late‑night job converts the partitioned data to K‑V JSON and loads it into HBase with a 62‑day TTL.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

System ArchitectureFlinkReal-time StreamingKafkaHBaseJimdb
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.