Comparing Distributions Between Groups: Visualization and Statistical Methods in Python
This article demonstrates how to compare the distribution of a variable across control and treatment groups using Python, covering data generation, visual techniques such as boxplots, histograms, KDE, CDF, QQ and ridgeline plots, and a suite of statistical tests including t‑test, SMD, Mann‑Whitney, permutation, chi‑square, Kolmogorov‑Smirnov and ANOVA for both two‑group and multi‑group scenarios.
When evaluating the causal effect of a strategy (e.g., a new feature, ad campaign, or drug), the gold standard is a randomized controlled trial (A/B test). After random assignment, it is essential to check whether observed covariates are balanced between the control and treatment groups.
In this tutorial we generate a synthetic dataset of 1,000 individuals with gender, age, and weekly income, randomly assigning them to a control group or one of several treatment arms.
<code>from src.utils import *
from src.dgp import dgp_rnd_assignment
df = dgp_rnd_assignment().generate_data()
df.head()
</code>The dataset is visualized and analyzed using several methods.
Visualization methods
Boxplot – shows median, quartiles, and outliers.
<code>sns.boxplot(data=df, x='Group', y='Income')
plt.title("Boxplot")
</code>Histogram – raw counts per bin (may be incomparable when group sizes differ).
<code>sns.histplot(data=df, x='Income', hue='Group', bins=50)
plt.title("Histogram")
</code>Using stat='density' and common_norm=False makes the histograms comparable.
<code>sns.histplot(data=df, x='Income', hue='Group', bins=50, stat='density', common_norm=False)
plt.title("Density Histogram")
</code>Kernel Density Estimate (KDE) provides a smooth approximation of the distribution.
<code>sns.kdeplot(x='Income', data=df, hue='Group', common_norm=False)
plt.title("Kernel Density Function")
</code>Cumulative Distribution Function (CDF) avoids bin choices and approximations.
<code>sns.histplot(x='Income', data=df, hue='Group', bins=len(df), stat="density", element="step", fill=False, cumulative=True, common_norm=False)
plt.title("Cumulative distribution function")
</code>QQ plot compares quantiles of the two groups.
<code>income = df['Income'].values
income_t = df.loc[df.Group=='treatment', 'Income'].values
income_c = df.loc[df.Group=='control', 'Income'].values
df_pct = pd.DataFrame()
df_pct['q_treatment'] = np.percentile(income_t, range(100))
df_pct['q_control'] = np.percentile(income_c, range(100))
plt.figure(figsize=(8,8))
plt.scatter(x='q_control', y='q_treatment', data=df_pct, label='Actual fit')
sns.lineplot(x='q_control', y='q_control', data=df_pct, color='r', label='Line of perfect fit')
plt.xlabel('Quantile of income, control group')
plt.ylabel('Quantile of income, treatment group')
plt.title('QQ plot')
</code>Statistical methods for two groups
Student t‑test compares means.
<code>from scipy.stats import ttest_ind
stat, p_value = ttest_ind(income_c, income_t)
print(f"t-test: statistic={stat:.4f}, p-value={p_value:.4f}")
</code>Standardized Mean Difference (SMD) provides a scale‑free measure of covariate imbalance.
<code>from causalml.match import create_table_one
df['treatment'] = df['Group'] == 'treatment'
create_table_one(df, 'treatment', ['Gender', 'Age', 'Income'])
</code>Mann‑Whitney U test compares distributions without assuming normality.
<code>from scipy.stats import mannwhitneyu
stat, p_value = mannwhitneyu(income_t, income_c)
print(f"Mann–Whitney U Test: statistic={stat:.4f}, p-value={p_value:.4f}")
</code>Permutation test evaluates the null by randomly shuffling group labels.
<code>sample_stat = np.mean(income_t) - np.mean(income_c)
stats = np.zeros(1000)
for k in range(1000):
labels = np.random.permutation((df['Group'] == 'treatment').values)
stats[k] = np.mean(income[labels]) - np.mean(income[labels==False])
p_value = np.mean(stats > sample_stat)
print(f"Permutation test: p-value={p_value:.4f}")
</code>Chi‑square test compares observed and expected frequencies across bins derived from the control group.
<code># Init dataframe
df_bins = pd.DataFrame()
# Generate bins from control group
_, bins = pd.qcut(income_c, q=10, retbins=True)
df_bins['bin'] = pd.cut(income_c, bins=bins).value_counts().index
# Apply bins to both groups
df_bins['income_c_observed'] = pd.cut(income_c, bins=bins).value_counts().values
df_bins['income_t_observed'] = pd.cut(income_t, bins=bins).value_counts().values
# Expected frequencies for treatment group
df_bins['income_t_expected'] = df_bins['income_c_observed'] / np.sum(df_bins['income_c_observed']) * np.sum(df_bins['income_t_observed'])
from scipy.stats import chisquare
stat, p_value = chisquare(df_bins['income_t_observed'], df_bins['income_t_expected'])
print(f"Chi-squared Test: statistic={stat:.4f}, p-value={p_value:.4f}")
</code>Kolmogorov‑Smirnov test measures the maximum absolute difference between the two CDFs.
<code>from scipy.stats import kstest
stat, p_value = kstest(income_t, income_c)
print(f"Kolmogorov-Smirnov Test: statistic={stat:.4f}, p-value={p_value:.4f}")
</code>Multi‑group visual comparison
Boxplot, violin plot, and ridgeline plot can be extended to several arms.
<code># Boxplot for multiple groups
sns.boxplot(x='Arm', y='Income', data=df.sort_values('Arm'))
plt.title("Boxplot, multiple groups")
</code> <code># Violin plot for multiple groups
sns.violinplot(x='Arm', y='Income', data=df.sort_values('Arm'))
plt.title("Violin Plot, multiple groups")
</code> <code># Ridgeline plot (requires joypy)
from joypy import joyplot
joyplot(df, by='Arm', column='Income', colormap=sns.color_palette('crest', as_cmap=True))
plt.xlabel('Income')
plt.title('Ridgeline Plot, multiple groups')
</code>Multi‑group statistical comparison
The one‑way ANOVA (F‑test) assesses whether at least one group mean differs.
<code>from scipy.stats import f_oneway
income_groups = [df.loc[df['Arm']==arm, 'Income'].values for arm in df['Arm'].dropna().unique()]
stat, p_value = f_oneway(*income_groups)
print(f"F Test: statistic={stat:.4f}, p-value={p_value:.4f}")
</code>Across all examples, visual methods give an intuitive sense of distributional differences, while statistical tests provide rigorous evidence about whether observed differences are systematic or due to random variation.
By combining both approaches, practitioners can make informed decisions in causal inference and A/B testing scenarios.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.