Scalable User Behavior Data Collection and Auto-Generated Datasets for Xianyu
Xianyu created a highly extensible user‑behavior collection framework that standardizes data into a common ODPS schema, uses JavaScript Proxy to intercept navigation and API calls, maps business metrics via JSON, aggregates reports to cut dataset‑creation effort from days to minutes while avoiding heavy full‑tracking overhead.
The article describes how Xianyu builds a highly extensible data‑analysis system that can serve many business lines. The core challenge is to adapt the fixed input/output format of the Nano‑Mirror analysis algorithm to diverse business data schemas.
To solve this, a standard ODPS dataset is defined (fields such as userid, bucketid, indexes, tags). Business teams insert their data into an intermediate table that follows this schema, enabling the Nano‑Mirror algorithms to run without modification.
Manual dataset creation usually takes at least two days (tracking, development, SQL generation). The proposed solution automates this process, reducing effort and avoiding data‑pollution caused by incorrect inserts.
Existing “full‑tracking” solutions are rejected because they (1) upload raw DOM positions that need heavy cleaning, (2) increase bandwidth and server load, (3) require strong code intrusion, and (4) cannot carry custom track parameters.
Instead, the article proposes a lightweight user‑behavior collection framework that hooks the front‑end navigation and HTTP APIs using JavaScript Proxy. The following snippet shows how navigator.push is wrapped:
navigator['push'] = new Proxy(navigator['push'], {
get(target, propKey) { return target[propKey]; },
set(target, propKey, value) { target[propKey] = value; return target[propKey]; },
apply(target, thisArg, args) {
const result = target.apply(this, args);
const ignoreNanoAnaly = args && args[0] && args[0].api && args[0].ignoreNanoAnaly;
if (ignoreNanoAnaly) { return result; }
if (result instanceof Promise && result.then) {
result.then(d => {
// insert custom logic here
return Promise.resolve(d);
}).catch(e => {
// error handling
return Promise.reject(e);
});
}
return result;
}
});Behavior mapping is expressed in a JSON configuration that links business‑specific metrics to the standard dataset fields. An example configuration is shown below:
{
"spms": [{
"spm": "common",
"tasks": [{
"indexType": 0,
"index": "index__ipv",
"behavior": [{
"type": 0,
"condition": "fleamarket://item",
"valueType": "1"
}]
}]
}, {
"spm": "spma.spmb",
"match_uv": true,
"tasks": [{
"indexType": 0,
"index": "index__gold_copper",
"behavior": [{
"type": 1,
"api": "mtop.api.lottery.draw",
"condition": "d.data.status===5",
"valueType": "0"
}]
}]
}]
}Collected actions are aggregated during the page lifecycle and reported only after the page is destroyed or the app goes to background for 15 seconds, dramatically reducing bandwidth and server pressure. An aggregated report looks like this:
{
"page": "spma.spmb",
"indexes": "index__visit=1,index__ipv=10,index__gold_copper=1"
}Since adopting this approach, multiple Xianyu services (e.g., “Shop While Earning”, “Kankan Dao”, “Quotation Sheet”, “322 Promotion”) have integrated the system with near‑zero development cost, cutting the typical two‑person‑day effort to a few minutes of configuration. The article concludes with future plans to deepen data‑science research, build knowledge bases from historical activities, and model user‑item preferences.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
