Optimizing Query Performance and Data Architecture for JD BIP Procurement System
This article details how JD’s BIP procurement system tackled massive data volume and complex query performance issues by reducing slow SQL, partitioning “in‑stock” orders, migrating large datasets to Elasticsearch, implementing dynamic query routing, and establishing robust monitoring, resulting in a 92% data reduction and smoother operations.
1. Background
The BIP procurement system, used by JD's purchasing department to acquire goods from suppliers, faces severe performance challenges due to complex queries, massive data volume (hundreds of millions of rows across dozens of tables), and a complicated data model.
2. Goals
Business: improve core query and export experience, optimize unreasonable query conditions. Technical: reduce slow SQL, lower database load, shrink table sizes, simplify the data model.
3. Challenges
Achieve high‑performance queries on massive, complex data while maintaining data integrity and supporting both real‑time and archived orders.
4. Solution
4.1 Reduce Query Data Volume
4.1.1 Investigation
Only about 8% of purchase orders require actual delivery to JD warehouses; the rest are merely records generated from customer orders.
4.1.2 “In‑stock” Order Segregation
Introduce a new business concept “in‑stock order” and isolate these orders for separate processing. Mark new orders at creation time, back‑fill historical orders via offline tables, and use binlog listeners with data subscription tasks to keep the marker consistent.
Data Synchronization
Incremental data is parsed from source binlog; full data is migrated using DTS with both full and incremental sync. After go‑live, switch to native binlog subscription with point‑in‑time rollback and re‑ingestion support.
Data Validation
Validate by table‑level total counts and random sampling; the “in‑stock” order volume is about 50 million versus 650 million total, a 92% reduction.
4.2 Enhance Complex Query Capability
4.2.1 Data Preparation
Move non‑critical “non‑in‑stock” orders to Elasticsearch (ES) to offload complex queries. Store 2 billion orders in ES, partitioned by month to support typical time‑range queries and fast order‑ID lookups.
4.2.2 Query Scheduling Strategy
Introduce a dynamic routing layer that, based on user, latency requirements, and data range, automatically directs queries to the appropriate cluster (MySQL or ES). Offline analysis of T‑1 order skew feeds a JimDB cluster to guide routing.
5. ES Cluster High‑Availability & Monitoring
Implement a 1:1 ES master‑slave pair with DUCC switch for seamless failover, and set up monitoring for order‑data latency, message‑queue backlog, and overall system health.
6. Gradual Rollout
Phase‑wise deployment: first data modules, then pre‑release validation with traffic replay, followed by user‑level gray release using JDOS load‑balancing, and finally external API gray release by controlling container counts.
7. Outcomes
Stable migration with no online issues, significant reduction in data volume, improved query performance, and a more maintainable architecture.
8. Future Outlook
Plans include finer‑grained monitoring, data‑model simplification, storage‑cluster consolidation, further MySQL load reduction, and a fallback mechanism for instant, seamless switching during anomalies.
9. Final Thoughts
ES was chosen over Doris for maturity and lower learning cost; the current design still keeps a JED backup for critical queries, but the system now achieves high performance and reliability through the combined optimizations.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.