Big Data 6 min read

Automating Real‑Time and Offline Data Verification for Ranking Lists during Large‑Scale Promotions

The article describes the evolution from manual to semi‑automatic and finally fully automatic solutions for verifying real‑time and offline ranking data during major sales events, detailing rule extraction, Hive‑based SQL generation, execution, and the resulting reduction in human effort.

JD Retail Technology
JD Retail Technology
JD Retail Technology
Automating Real‑Time and Offline Data Verification for Ranking Lists during Large‑Scale Promotions

The ranking list (竞速榜) is a real‑time sales leaderboard used during major promotional periods, requiring accurate data calculation and verification to ensure brand resource allocation on JD.com.

Initially, verification was performed entirely by hand: daily snapshot data were collected at 23:59, and offline SQL scripts were manually written to reproduce the rankings, a process that could take over 20 person‑days for 100+ rules.

To reduce labor, a semi‑automatic approach was introduced, automating snapshot collection, exporting rule configurations to Hive, generating SQL fragments via CASE‑WHEN logic, merging them into executable tasks, and pushing both real‑time and offline results to MySQL for comparison. This cut the verification time dramatically but still required manual SQL adjustments whenever rules changed.

The fully automatic solution further eliminates human intervention by automatically extracting rule data daily, updating target SQL in Hive, retrieving and executing the SQL without manual steps, and comparing results directly in the database, highlighting any discrepancies.

Key automation steps include:

Daily extraction of rule configurations and storage in Hive, with automatic SQL regeneration and persistence.

Automatic retrieval and execution of the generated SQL using Hive command‑line utilities.

The following Python snippet shows how the Hive task is invoked to run a SQL command and capture its standard output:

#HiveTask增加run_shell_cmd_out函数只返回标准流的内容在标准客户端执行如下python脚本
from HiveTask import HiveTask
ht = HiveTask()
ht.run_shell_cmd_out(shellcmd='hive -e "select * from table;"')

During the 618 promotion in 2023, this fully automated pipeline allowed a new development team with no prior verification experience to handle over 5,000 ranking rules without the previously required 900+ person‑days, freeing up development resources and improving promotion readiness.

SQLAutomationreal-time analyticsHivedata verification
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.