Databases 14 min read

Cutting Procurement Query Times by 92%: Data Heterogeneity & ES Strategies

This case study details how the BIP procurement system tackled massive data volume, complex queries, and slow SQL by segmenting inbound orders, leveraging Elasticsearch, introducing a dynamic routing layer, and implementing robust ES high‑availability and monitoring, ultimately reducing query load by over 90%.

JD Cloud Developers

Feb 5, 2025

Cutting Procurement Query Times by 92%: Data Heterogeneity & ES Strategies

0. Introduction

The BIP procurement system enables JD.com’s purchasing department to buy goods from suppliers, offering multiple ways to create purchase orders and supporting approval, feedback, and transmission functions.

System value: increasing inventory by purchasing goods from suppliers, fulfilling inventory turnover and customer orders—the most critical first link in the supply chain.

1. Background

After years of iteration, the procurement system faces severe performance challenges at the database query layer due to:

Complex queries : MySQL and JED have accumulated many filter conditions, making it hard to change calling patterns.

Large data volume : Daily average of 1.5 million orders, with main tables and auxiliary tables reaching billions of rows.

Complex data model : Over 20 tables are involved in a single order, requiring many joins.

Key problems:

Business side: poor list page query/export experience, especially with data skew; unreasonable query conditions.

Technical side: numerous slow SQLs from multi‑table joins, large tables causing DDL difficulties, and a fragmented data model increasing development cost.

2. Objectives

Business: improve core query/export experience and optimize query conditions.

Technical: reduce slow SQLs, lower database load, shrink table sizes, and simplify the data model.

3. Challenges

Boost query performance under massive data and complex scenarios.

Comparison of procurement order system vs. consumer sales order system complexity:

4. Solution

4.1 Reduce Query Data Volume

4.1.1 Preliminary Research

Analysis shows only 8% of purchase orders require actual delivery to JD warehouses (full fulfillment). The rest are merely record‑creation orders.

Based on this, a new business concept “Inbound Order” was introduced to isolate these orders during queries, similar to caching hot data in OS/middleware.

4.1.2 Inbound Order Heterogeneity

Execution Flow

Tag inbound orders: incremental orders are written at creation; historical orders are pushed via offline tables.

Order creation module must be upgraded first to ensure data consistency.

If the data parsing module cannot fetch the order tag from JimKV during binlog processing, a compensation query rewrites the binlog.

Binlog listening uses the company’s data subscription task via JMQ, partitioned by order number, with no retry to preserve order.

Data sync: incremental data uses source binlog; historical data uses DTS for full + incremental sync.

Data reconciliation: count tables first, then sample details; inbound orders now 50 million vs. total 6.5 billion (92% reduction).

4.2 Enhance Complex Query Capability

4.2.1 Data Preparation

Although inbound orders are isolated, non‑inbound orders still need query support; their timeliness requirements are low, so they are migrated to Elasticsearch.

ES Data Heterogeneity Process

Sync real‑time + archived orders (≈2 billion) to ES, partitioned by month to improve index performance.

Simplify archiving: write only to JED; ES data is populated by the data parsing module, improving timeliness.

Filter delete statements in the parsing module to keep full‑order data intact.

Optimize MQ consumption: merge binlog requests per table and cache ES updates in JimKV for two minutes before flushing.

Route all table data by order number to the same MQ partition to avoid hot partitions.

4.2.2 Query Scheduling Strategy Design

Before optimization, all queries hit the database directly, performance depended on precise filter conditions.

After optimization, a dynamic routing layer was added:

Offline compute of T‑1 order data skew, push results to JimDB cluster.

Based on user, latency requirement, and query range, automatically schedule queries to the appropriate data cluster.

5. ES Primary/Backup Mechanism & Data Monitoring

5.1 ES Cluster Primary/Backup

1:1 ES clusters provide mutual backup with rapid failover for high availability.

5.2 Data Monitoring

6. Gray‑Release Deployment

Step 1: Deploy data modules first (most time‑consuming due to data volume and complexity).

Step 2: Validate in pre‑release environment with traffic replay (limited automation).

Step 3: User‑level gray release using JDOS load‑balancing combined with ERP.

Step 4: External interface gray release by gradually increasing container instances.

7. Results

Smooth switch with no online issues.

8. Future Outlook

Add daily‑level data comparison and anomaly alerts for proactive monitoring.

Refine data model by pruning obsolete order tables to lower development and operation costs.

Gradually decommission non‑core storage clusters to improve fault tolerance.

Enable ES to fully replace inbound order JED once reliability is proven.

Identify database risks via slow‑log monitoring and continuously reduce load.

Explore MySQL load‑reduction alternatives and eliminate zombie orders.

Implement fallback plans for instant, seamless switching during sync or DB anomalies.

9. Closing Remarks

ES was chosen over Doris due to team familiarity and lower learning cost.

Future possibility of dropping JED tables for inbound orders as ES proves reliable.

Current optimizations address core pain points; further improvements remain.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Big Data Query Optimization data modeling Databases

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.