Operations 7 min read

How to Eliminate Extreme Values in Freight AB Tests Using Chi‑Square Calibration

This article explains a chi‑square‑based method for detecting and removing extreme‑value orders in freight AB experiments to preserve group homogeneity, improve metric reliability, and enhance the scientific validity of experimental conclusions.

Huolala Tech

Dec 22, 2023

How to Eliminate Extreme Values in Freight AB Tests Using Chi‑Square Calibration

Introduction

In freight AB experiments, orders are divided into an experimental group (policy applied) and a control group (no policy). Data scientists compare metrics such as Gross Transaction Volume (GTV) pairing rates between the two groups to evaluate strategy effectiveness.

Homogeneity Issue Description

Ensuring that the two groups are homogeneous—i.e., showing no significant differences in key metrics when no policy is applied—is essential. Extreme‑value orders (very high transaction amounts) can break this homogeneity and bias experimental conclusions.

Chi‑Square Based Outlier Handling

The article proposes a chi‑square‑statistic method to identify and filter extreme‑value orders, thereby maintaining homogeneity of GTV pairing rates across groups.

Process Overview

Data preprocessing: prepare experiment data, define dimensions for traffic segmentation (e.g., city, vehicle type), and list candidate threshold values.

Computation: for each dimension, calculate the optimal threshold using the prescribed method.

Validation: ensure the filtered traffic does not exceed 0.01% of total traffic; if it does, return to the previous step and select a new threshold.

Output: the best threshold for each dimension.

Detailed Steps

Separate data into experimental and control groups; identify metrics needing homogeneity (e.g., order amount).

Define a list of candidate thresholds (approximately ten values).

Segment traffic into k × n categories based on the chosen dimension and count metric values.

Compute the chi‑square statistic for each segment after applying a candidate threshold; calculate δ i,j = |χ² i,j − theoretical χ²|.

Select the threshold that minimizes δ i,j .

Validate that the filtered traffic proportion is ≤ 0.01%; if not, discard the threshold and repeat step 5.

Case Study

For city‑level segmentation, thresholds {4k, 6k, 8k, 10k, 12k, 14k, 16k, 18k} were evaluated across city grades S, A, B, C, D. Using the chi‑square difference (Δχ²) and the 0.01% traffic rule, the optimal thresholds were determined as 10k for S, 6k for A and B, 4k for C and D. The flowchart below illustrates the selection process.

Summary

Extreme values commonly disrupt homogeneity in freight AB experiments, compromising metric reliability. The proposed chi‑square‑based outlier removal method effectively restores homogeneity, ensuring more scientific and accurate experimental outcomes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AB testing Logistics Data Science experiment design outlier removal chi-square

Written by

Huolala Tech

Technology reshapes logistics

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.