Big Data 11 min read

Data Serviceization at JD: From Zero to One and Beyond

This article presents JD's data service platform, describing its origin, performance optimizations, flexible API generation, caching strategies, service orchestration, and governance, and includes a Q&A that addresses security, performance, and multi‑source data handling challenges.

DataFunSummit

Jan 29, 2023

Data Serviceization at JD: From Zero to One and Beyond

Introduction – JD's Data Intelligence Department built a data service platform (EZD framework) to let engineers generate open data APIs by simply providing SQL, dramatically reducing development cycles for high‑frequency business demands such as the 618 promotion.

1. Origin: Data Serviceization from 0 to 1 – Business units required fast, open data APIs; the traditional two‑week development cycle was too slow. The EZD framework enables one‑click API generation from SQL while ensuring performance and dynamic parameters.

The solution replaces the traditional fixed API development model: data sources are read via JDBC, then exposed via HTTP or RPC. Engineers fill in SQL, click publish, and the system hot‑deploys the API.

2. Interface Performance – Early versions suffered from high latency due to repeated SQL look‑ups. Caching SQL definitions in an in‑memory routing table and switching to the Hikari connection pool reduced the platform’s time‑consumption from 97% to 1%.

3. Interface Flexibility – Parameters can be injected into SQL using colon syntax; dynamic query conditions are handled via FreeMarker templates (IF, SWITCH, loops), reducing 80 APIs to 5.

4. From 1 to 10: Scaling the Platform – During JD’s 618 promotion, dozens of metrics were served via the data APIs. The platform supports rapid indicator development, one‑click cache addition, and integration with various storage systems (Elasticsearch‑SQL, Redis, HBase).

4.1 NoSQL Storage API Generation – Elasticsearch‑SQL executes SQL on ES; Redis supports KV reads/writes; HBase supports get/scan operations.

4.2 One‑Click Caching – Two cache modes: passive (created on first request) and active (periodic refresh). Passive caching handles dynamic parameters but may cause QPS spikes; active caching eliminates spikes by pre‑loading data.

5. Service Orchestration – Complex business logic is encapsulated in workflow‑driven APIs, allowing conditional branching, debugging, and automatic execution without manual intervention.

6. Governance of Data Services – Data services involve producers, consumers, and governance. A service market layers services (entity, interaction, application). Governance enforces policies, service classification, quality control, and release checkpoints.

Q&A

Q1: Does high flexibility affect security or performance? A1: Data sources have owners who authorize API groups; performance bottlenecks lie in the underlying databases. The platform provides distributed rate‑limiting and circuit‑breaking to protect stability.

Q2: Can a MySQL‑based API be switched to ClickHouse or another SQL source? A2: Yes, if the SQL is compatible with the target engine; otherwise, conversion is needed.

Thank you for attending; follow the DataFunTalk channel for more technical content.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Caching API Service Governance Data Service JD

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.