Databases 11 min read

Optimizing Query Performance and Data Architecture for JD BIP Procurement System

This article details how JD’s BIP procurement system tackled massive data volume and complex query performance issues by reducing slow SQL, partitioning “in‑stock” orders, migrating large datasets to Elasticsearch, implementing dynamic query routing, and establishing robust monitoring, resulting in a 92% data reduction and smoother operations.

JD Tech Talk

Feb 5, 2025

Optimizing Query Performance and Data Architecture for JD BIP Procurement System

1. Background

The BIP procurement system, used by JD's purchasing department to acquire goods from suppliers, faces severe performance challenges due to complex queries, massive data volume (hundreds of millions of rows across dozens of tables), and a complicated data model.

2. Goals

Business: improve core query and export experience, optimize unreasonable query conditions. Technical: reduce slow SQL, lower database load, shrink table sizes, simplify the data model.

3. Challenges

Achieve high‑performance queries on massive, complex data while maintaining data integrity and supporting both real‑time and archived orders.

4. Solution

4.1 Reduce Query Data Volume

4.1.1 Investigation

Only about 8% of purchase orders require actual delivery to JD warehouses; the rest are merely records generated from customer orders.

4.1.2 “In‑stock” Order Segregation

Introduce a new business concept “in‑stock order” and isolate these orders for separate processing. Mark new orders at creation time, back‑fill historical orders via offline tables, and use binlog listeners with data subscription tasks to keep the marker consistent.

Data Synchronization

Incremental data is parsed from source binlog; full data is migrated using DTS with both full and incremental sync. After go‑live, switch to native binlog subscription with point‑in‑time rollback and re‑ingestion support.

Data Validation

Validate by table‑level total counts and random sampling; the “in‑stock” order volume is about 50 million versus 650 million total, a 92% reduction.

4.2 Enhance Complex Query Capability

4.2.1 Data Preparation

Move non‑critical “non‑in‑stock” orders to Elasticsearch (ES) to offload complex queries. Store 2 billion orders in ES, partitioned by month to support typical time‑range queries and fast order‑ID lookups.

4.2.2 Query Scheduling Strategy

Introduce a dynamic routing layer that, based on user, latency requirements, and data range, automatically directs queries to the appropriate cluster (MySQL or ES). Offline analysis of T‑1 order skew feeds a JimDB cluster to guide routing.

5. ES Cluster High‑Availability & Monitoring

Implement a 1:1 ES master‑slave pair with DUCC switch for seamless failover, and set up monitoring for order‑data latency, message‑queue backlog, and overall system health.

6. Gradual Rollout

Phase‑wise deployment: first data modules, then pre‑release validation with traffic replay, followed by user‑level gray release using JDOS load‑balancing, and finally external API gray release by controlling container counts.

7. Outcomes

Stable migration with no online issues, significant reduction in data volume, improved query performance, and a more maintainable architecture.

8. Future Outlook

Plans include finer‑grained monitoring, data‑model simplification, storage‑cluster consolidation, further MySQL load reduction, and a fallback mechanism for instant, seamless switching during anomalies.

9. Final Thoughts

ES was chosen over Doris for maturity and lower learning cost; the current design still keeps a JED backup for critical queries, but the system now achieves high performance and reliability through the combined optimizations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch Query Optimization

Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.