How Real-Time Big Data Pipelines Detect E‑Commerce Ad Misplacements
This article explains how a large‑scale e‑commerce search advertising system uses real‑time big‑data pipelines, log synchronization, NoSQL storage, and proactive verification to automatically discover and correct ad placement errors across the entire data processing chain, protecting both advertisers and the platform.
Background
The search advertising data processing chain on an e‑commerce platform is long and involves multiple stages: advertisers submit campaigns, data is written to a database, content files are built for online indexes, and the search service uses these indexes for retrieval. Any delay or anomaly in this chain can cause financial loss for advertisers and the platform.
The chain includes:
Advertisers create ad campaigns in the backend.
Campaign and keyword data are written to the database.
Data is built into content files via full or incremental construction (data warehouse import or streaming).
BuildService creates the search index from these content files.
Stage Achievements
By leveraging TTLog (log stream sync), Lindorm (massive NoSQL storage), BCP (real‑time business verification platform), MetaQ (message queue), Jingwei (online data sync), and Xflush (real‑time log analysis), the team built a system that discovers search ad misplacements in real time and covers all user exposure traffic.
The system also adds proactive checks at data‑change nodes, allowing some issues to be detected before users encounter them.
Additionally, using TTLog + Blink (real‑time compute) + Alibaba Cloud Log Service (SLS) + Xflush, the team achieved real‑time visibility of engine and algorithm performance.
Technical Implementation
1. Engine Exposure Log Processing
User requests generate real‑time exposure logs, which are collected by TTLog from all search engine nodes. BCP cleans, filters, and samples the stream, then pushes data to MetaQ.
2. DB Data Processing
MySQL stores only the latest state of each business object. To obtain the last state before an exposure, Jingwei captures every DB change and writes snapshots to Lindorm (HBase‑based).
3. Data Consistency Verification
The ad testing service (igps) consumes messages from MetaQ, extracts the exposure state from the search engine, and queries Lindorm for the corresponding MySQL state before the exposure time. It then compares the two states and logs any inconsistencies.
4. Proactive Verification at Data‑Change Nodes
Because online traffic is random, proactive checks are added at two key nodes: MySQL data changes and engine index switches. For MySQL changes, Jingwei creates a query request to the engine. For index switches, historical traffic is aggregated to generate test cases, and batch requests are sent to the engine during full‑index switches. Both use the same consistency verification pipeline.
5. Real‑Time Engine and Algorithm Quality
Search engine PV logs contain valuable signals. Using Blink, the team parses PV information from TTLog, outputs to SLS, and visualizes it with Xflush. This enables detection of issues such as missing parameters in SP request strings that caused regional ad delivery failures, and provides detailed metrics for algorithm tuning.
Core Questions
Why Lindorm? – To overcome performance bottlenecks of writing change snapshots to another MySQL instance, achieving query latency reduction from ~1 s to ~70 ms.
Why BCP + MetaQ + igps? – To decouple message production and consumption, reduce network overhead, and maintain low CPU usage even at 100 % sampling.
Why not use Blink for everything? – Click verification workloads are small and benefit from BCP’s flexible features; Blink still has stability concerns.
How to split SP request keys? – Use Blink’s UDTF to parse request strings into key‑value pairs, outputting rows in “validKey=…,validValue=…” format for Xflush grouping.
Summary and Future Plans
The article demonstrates an end‑to‑end real‑time detection solution for data consistency issues in e‑commerce search advertising, combining real‑time discovery with proactive verification at data‑change nodes.
Future work includes exposing richer real‑time dimensions for business scenarios and moving the technology stack to pre‑release testing to achieve one‑click automated functional, performance, and effect testing across the full search‑engine chain.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
