Fundamentals 21 min read

Comparing Distributions Between Groups: Visualization and Statistical Methods in Python

This article demonstrates how to compare the distribution of a variable across control and treatment groups using Python, covering data generation, visual techniques such as boxplots, histograms, KDE, CDF, QQ and ridgeline plots, and a suite of statistical tests including t‑test, SMD, Mann‑Whitney, permutation, chi‑square, Kolmogorov‑Smirnov and ANOVA for both two‑group and multi‑group scenarios.

Python Programming Learning Circle

Mar 22, 2023

Comparing Distributions Between Groups: Visualization and Statistical Methods in Python

When evaluating the causal effect of a strategy (e.g., a new feature, ad campaign, or drug), the gold standard is a randomized controlled trial (A/B test). After random assignment, it is essential to check whether observed covariates are balanced between the control and treatment groups.

In this tutorial we generate a synthetic dataset of 1,000 individuals with gender, age, and weekly income, randomly assigning them to a control group or one of several treatment arms.

from src.utils import *
from src.dgp import dgp_rnd_assignment

df = dgp_rnd_assignment().generate_data()
df.head()

The dataset is visualized and analyzed using several methods.

Visualization methods

Boxplot – shows median, quartiles, and outliers.

sns.boxplot(data=df, x='Group', y='Income')
plt.title("Boxplot")

Histogram – raw counts per bin (may be incomparable when group sizes differ).

sns.histplot(data=df, x='Income', hue='Group', bins=50)
plt.title("Histogram")

Using stat='density' and common_norm=False makes the histograms comparable.

sns.histplot(data=df, x='Income', hue='Group', bins=50, stat='density', common_norm=False)
plt.title("Density Histogram")

Kernel Density Estimate (KDE) provides a smooth approximation of the distribution.

sns.kdeplot(x='Income', data=df, hue='Group', common_norm=False)
plt.title("Kernel Density Function")

Cumulative Distribution Function (CDF) avoids bin choices and approximations.

sns.histplot(x='Income', data=df, hue='Group', bins=len(df), stat="density", element="step", fill=False, cumulative=True, common_norm=False)
plt.title("Cumulative distribution function")

QQ plot compares quantiles of the two groups.

income = df['Income'].values
income_t = df.loc[df.Group=='treatment', 'Income'].values
income_c = df.loc[df.Group=='control', 'Income'].values

df_pct = pd.DataFrame()
df_pct['q_treatment'] = np.percentile(income_t, range(100))
df_pct['q_control'] = np.percentile(income_c, range(100))

plt.figure(figsize=(8,8))
plt.scatter(x='q_control', y='q_treatment', data=df_pct, label='Actual fit')
sns.lineplot(x='q_control', y='q_control', data=df_pct, color='r', label='Line of perfect fit')
plt.xlabel('Quantile of income, control group')
plt.ylabel('Quantile of income, treatment group')
plt.title('QQ plot')

Statistical methods for two groups

Student t‑test compares means.

from scipy.stats import ttest_ind
stat, p_value = ttest_ind(income_c, income_t)
print(f"t-test: statistic={stat:.4f}, p-value={p_value:.4f}")

Standardized Mean Difference (SMD) provides a scale‑free measure of covariate imbalance.

from causalml.match import create_table_one

df['treatment'] = df['Group'] == 'treatment'
create_table_one(df, 'treatment', ['Gender', 'Age', 'Income'])

Mann‑Whitney U test compares distributions without assuming normality.

from scipy.stats import mannwhitneyu
stat, p_value = mannwhitneyu(income_t, income_c)
print(f"Mann–Whitney U Test: statistic={stat:.4f}, p-value={p_value:.4f}")

Permutation test evaluates the null by randomly shuffling group labels.

sample_stat = np.mean(income_t) - np.mean(income_c)
stats = np.zeros(1000)
for k in range(1000):
    labels = np.random.permutation((df['Group'] == 'treatment').values)
    stats[k] = np.mean(income[labels]) - np.mean(income[labels==False])
p_value = np.mean(stats > sample_stat)
print(f"Permutation test: p-value={p_value:.4f}")

Chi‑square test compares observed and expected frequencies across bins derived from the control group.

# Init dataframe
df_bins = pd.DataFrame()
# Generate bins from control group
_, bins = pd.qcut(income_c, q=10, retbins=True)
df_bins['bin'] = pd.cut(income_c, bins=bins).value_counts().index
# Apply bins to both groups
df_bins['income_c_observed'] = pd.cut(income_c, bins=bins).value_counts().values
df_bins['income_t_observed'] = pd.cut(income_t, bins=bins).value_counts().values
# Expected frequencies for treatment group
df_bins['income_t_expected'] = df_bins['income_c_observed'] / np.sum(df_bins['income_c_observed']) * np.sum(df_bins['income_t_observed'])

from scipy.stats import chisquare
stat, p_value = chisquare(df_bins['income_t_observed'], df_bins['income_t_expected'])
print(f"Chi-squared Test: statistic={stat:.4f}, p-value={p_value:.4f}")

Kolmogorov‑Smirnov test measures the maximum absolute difference between the two CDFs.

from scipy.stats import kstest
stat, p_value = kstest(income_t, income_c)
print(f"Kolmogorov-Smirnov Test: statistic={stat:.4f}, p-value={p_value:.4f}")

Multi‑group visual comparison

Boxplot, violin plot, and ridgeline plot can be extended to several arms.

# Boxplot for multiple groups
sns.boxplot(x='Arm', y='Income', data=df.sort_values('Arm'))
plt.title("Boxplot, multiple groups")

# Violin plot for multiple groups
sns.violinplot(x='Arm', y='Income', data=df.sort_values('Arm'))
plt.title("Violin Plot, multiple groups")

# Ridgeline plot (requires joypy)
from joypy import joyplot
joyplot(df, by='Arm', column='Income', colormap=sns.color_palette('crest', as_cmap=True))
plt.xlabel('Income')
plt.title('Ridgeline Plot, multiple groups')

Multi‑group statistical comparison

The one‑way ANOVA (F‑test) assesses whether at least one group mean differs.

from scipy.stats import f_oneway
income_groups = [df.loc[df['Arm']==arm, 'Income'].values for arm in df['Arm'].dropna().unique()]
stat, p_value = f_oneway(*income_groups)
print(f"F Test: statistic={stat:.4f}, p-value={p_value:.4f}")

Across all examples, visual methods give an intuitive sense of distributional differences, while statistical tests provide rigorous evidence about whether observed differences are systematic or due to random variation.

By combining both approaches, practitioners can make informed decisions in causal inference and A/B testing scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python A/B testing visualization Distribution Comparison Statistical Tests

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.