How JD Built a Millisecond‑Scale Real‑Time Browsing Record System for 500M Users
This article details JD's end‑to‑end design of a real‑time browsing record platform that captures, stores, and queries up to 200 recent items per user with millisecond latency, covering architecture, hot‑cold data separation, microservice APIs, and streaming pipelines using Kafka, Flink, Jimdb, and HBase.
System Overview
The browsing record system records each JD user’s real‑time product‑detail page visits, de‑duplicates them by product dimension, and stores up to the latest 200 entries per user, delivering millisecond‑level query latency. The architecture consists of four modules: data storage, data query, real‑time reporting, and offline reporting.
Data Storage Module Design and Implementation
To handle an estimated trillion‑scale records, JD separates hot and cold data. Recent (T‑4) records are kept in Jimdb’s in‑memory ordered sets, keyed by username with SKU as elements and timestamps as scores, and expire after four days. Older (T+4) records are flushed to HBase as K‑V JSON strings, with usernames MD5‑hashed and prefixed to avoid hotspotting; these entries expire after 62 days.
Query Service Module Design and Implementation
The query service exposes three micro‑service APIs: total count, record list, and delete operation. Rate limiting is achieved with Guava’s RateLimiter and a Caffeine local cache, applied per caller, per user, and globally. The total‑count flow first checks a cache; on miss it reads real‑time data from Jimdb, enriches with product info, deduplicates by SPU, and conditionally merges offline data from HBase when the hot data does not satisfy the maximum record limit. The list API follows the same steps, returning a merged, deduplicated list.
Real‑Time Reporting Module Design and Implementation
Front‑end services push user PV events to a Kafka topic with 50 partitions, providing load‑balancing and peak‑shaving. A Flink cluster consumes the topic, writes events to Jimdb via a Lua script that batches multiple commands (insert, count, delete, expire) into a single network round‑trip. Flink was chosen for its low latency, high throughput, fault‑tolerant distributed snapshots, and decoupling from the front‑end, which is essential during massive traffic spikes such as flash‑sale events.
Offline Reporting Module Design and Implementation
The offline pipeline runs daily: (1) the product‑detail front‑end reports PVs to a data‑mart table; (2) a nightly job extracts deleted‑record IDs from JD’s MySQL store; (3) a morning job deduplicates the last 60 days of PVs (capped at 200 per user), filters out deleted items, and writes the result to an offline partition table; (4) a late‑night job converts the partitioned data to K‑V JSON and loads it into HBase with a 62‑day TTL.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
