Databases 22 min read

Optimizing Large‑Scale Batch Processing for an Advertising Platform: From Query Tuning to Load‑Balanced Execution

This article presents a real‑world case study of optimizing massive batch‑processing tasks in an ad‑platform by applying query‑level improvements, cursor‑based pagination, shard‑aware batch updates, JVM‑tuned garbage collection, and distributed load‑balancing, ultimately reducing CPU usage from 80% to under 2% and cutting query‑per‑minute volume from millions to a few thousand.

JD Retail Technology
JD Retail Technology
JD Retail Technology
Optimizing Large‑Scale Batch Processing for an Advertising Platform: From Query Tuning to Load‑Balanced Execution

The article describes a series of optimizations applied to a high‑traffic advertising system that runs four periodic batch jobs (unit‑time‑slot update, plan‑time‑slot update, unit‑budget recovery, plan‑budget recovery) on a table dsp_show_status containing over 85 million rows distributed across 64 shards.

Overall optimization idea : improve performance and resource consumption by (1) reducing unnecessary data scans, (2) increasing batch step size, (3) eliminating redundant logs, (4) shrinking temporary objects, (5) using cursor‑based queries, (6) performing shard‑aware batch updates, and (7) distributing work via a retry‑capable event bus.

Business background : only the external‑ad tables are relevant; the table mixes internal data, causing sparse primary‑key ranges and high query counts (≈2 million QPM). The unit‑level tasks process ~1.3 million rows per run, while plan‑level tasks handle ~180 k rows.

Machine configuration & GC : each JVM runs with 8 CPU, 16 GB RAM and G1GC options ( -Xms8192m -Xmx8192m -XX:MaxMetaspaceSize=1024m -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=8 ). High GC frequency was identified as a major CPU drain.

Task processing logic (pseudo‑code) :

startAutoPk = minAutoPk;
while (startAutoPk <= maxAutoPk) {
    startAutoPkFinal = startAutoPk;
    endAutoPkFinal = Math.min(startAutoPk + step, maxAutoPk);
    List
list = showStatusConsumer.betweenListByParam(startAutoPk, endAutoPkFinal, conditions);
    startAutoPk = endAutoPkFinal + 1;
}

After fetching the IDs, the system retrieves unit extension data, computes sub‑status based on current time vs. start/end times, groups records by the four possible sub‑states, and performs four batch updates.

Cursor‑based pagination replaces deep offset pagination to avoid scanning millions of rows. Example SQL:

SELECT id, cga_id, status_bitmap1, user_id
FROM dsp_show_status
WHERE id BETWEEN #{startAutoPk} AND #{endAutoPk}
  AND id > #{lastMaxId}
ORDER BY id
LIMIT #{batchSize}

This approach reduces query count dramatically (from ~45 k calls per run to ~2 k) and ensures each batch returns a predictable number of rows.

Shard‑aware batch updates group IDs by their shard key (using BinaryHashUtil.getShardByVindex ) and issue updates with the appropriate user_id set, preventing gap locks and long‑running transactions.

Introducing next_time field : a timestamp indicating the next required status change, allowing the job to filter only rows that truly need processing, cutting the scanned rows from ~1.3 M to ~10 k per execution.

Load balancing : small tasks are dispatched via a retry‑capable event bus rather than raw MQ, ensuring timely retries and stable execution across multiple machines.

Results : CPU usage dropped from 80% to ~1% (or <2% after final stage), task duration fell from 30 minutes to ≤30 seconds, database QPM fell from ~2 million to ~2 k, and overall query volume decreased by two orders of magnitude while maintaining correct sub‑status updates.

JavaShardingbatch processingPerformance TuningDatabase Optimizationcursor pagination
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.