How to Eliminate Extreme Values in Freight AB Tests Using Chi‑Square Calibration
This article explains a chi‑square‑based method for detecting and removing extreme‑value orders in freight AB experiments to preserve group homogeneity, improve metric reliability, and enhance the scientific validity of experimental conclusions.
Introduction
In freight AB experiments, orders are divided into an experimental group (policy applied) and a control group (no policy). Data scientists compare metrics such as Gross Transaction Volume (GTV) pairing rates between the two groups to evaluate strategy effectiveness.
Homogeneity Issue Description
Ensuring that the two groups are homogeneous—i.e., showing no significant differences in key metrics when no policy is applied—is essential. Extreme‑value orders (very high transaction amounts) can break this homogeneity and bias experimental conclusions.
Chi‑Square Based Outlier Handling
The article proposes a chi‑square‑statistic method to identify and filter extreme‑value orders, thereby maintaining homogeneity of GTV pairing rates across groups.
Process Overview
Data preprocessing: prepare experiment data, define dimensions for traffic segmentation (e.g., city, vehicle type), and list candidate threshold values.
Computation: for each dimension, calculate the optimal threshold using the prescribed method.
Validation: ensure the filtered traffic does not exceed 0.01% of total traffic; if it does, return to the previous step and select a new threshold.
Output: the best threshold for each dimension.
Detailed Steps
Separate data into experimental and control groups; identify metrics needing homogeneity (e.g., order amount).
Define a list of candidate thresholds (approximately ten values).
Segment traffic into k × n categories based on the chosen dimension and count metric values.
Compute the chi‑square statistic for each segment after applying a candidate threshold; calculate δ i,j = |χ² i,j − theoretical χ²|.
Select the threshold that minimizes δ i,j .
Validate that the filtered traffic proportion is ≤ 0.01%; if not, discard the threshold and repeat step 5.
Case Study
For city‑level segmentation, thresholds {4k, 6k, 8k, 10k, 12k, 14k, 16k, 18k} were evaluated across city grades S, A, B, C, D. Using the chi‑square difference (Δχ²) and the 0.01% traffic rule, the optimal thresholds were determined as 10k for S, 6k for A and B, 4k for C and D. The flowchart below illustrates the selection process.
Summary
Extreme values commonly disrupt homogeneity in freight AB experiments, compromising metric reliability. The proposed chi‑square‑based outlier removal method effectively restores homogeneity, ensuring more scientific and accurate experimental outcomes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
