Fundamentals 11 min read

Hypothesis Testing, Confidence Intervals, and Effect Size with Python

This tutorial explains how to perform hypothesis testing, chi‑square, t‑tests, confidence‑interval calculation, and effect‑size measurement in Python, covering data preparation, statistical assumptions, code implementation, and interpretation of results for real‑world datasets.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Hypothesis Testing, Confidence Intervals, and Effect Size with Python

The article introduces statistical analysis for data science, focusing on hypothesis testing, confidence intervals, and effect size, and demonstrates how to implement each method using Python libraries such as pandas, scipy, and seaborn.

It starts by defining hypothesis testing, describing null (H0) and alternative (H1) hypotheses, and provides simple examples such as comparing view counts across Zhihu accounts or checking whether gas emissions meet regulatory standards.

For the chi‑square test, the author shows how to create a cross‑tabulation with pandas.crosstab, then runs scipy.stats.chi2_contingency to obtain the test statistic, p‑value, and degrees of freedom, interpreting a p‑value smaller than 0.05 as evidence against H0.

Code example for the chi‑square preparation:

import pandas as pd<br>import numpy as np<br>data = pd.read_excel("path_to_file.xlsx")  # read data file<br>a = pd.crosstab(index=data["平台"], columns=data["菜系"], margins=True)  # cross‑tabulation<br>print(a)

Next, the tutorial covers the one‑sample t‑test. After checking normality with the Shapiro‑Wilk test ( stats.shapiro), the author uses stats.ttest_1samp to compare the sample mean against a target value, adjusts the p‑value for a one‑tailed test, and concludes based on the 0.05 significance level.

Normality test code:

from scipy import stats<br>stats.shapiro(data)  # returns statistic and p‑value

One‑sample t‑test code:

from scipy import stats<br>stats.ttest_1samp(data, 20)  # compare against 20 ppm

The article then explains how to compute a confidence interval: it looks up the t‑value for 95% confidence (df = n‑1), calculates the standard error with stats.sem, and builds the interval around the sample mean.

Standard error code:

from scipy import stats<br>stats.sem(data)  # standard error of the mean

Effect size (Cohen's d) is introduced to assess practical significance. The author computes d as the difference between the sample mean and the target divided by the sample standard deviation.

Effect size code: d = (data.mean() - 20) / data.std() # d ≈ -0.94 Finally, the tutorial presents the two‑sample t‑test, outlining its assumptions (normality, independence, equal variances). It demonstrates variance homogeneity testing with Levene’s test ( stats.levene) and then runs the two‑sample t‑test, interpreting a p‑value greater than 0.05 as no significant difference.

Levene’s test and two‑sample t‑test code:

from scipy import stats<br>leneneTestRes = stats.levene(sample1, sample2)<br>print(leneneTestRes)<br>t_stat, p_val = stats.ttest_ind(sample1, sample2, equal_var=True)

Throughout the article, images of data tables, test results, and plots are included to illustrate each step.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythonstatisticsdata analysisconfidence intervalhypothesis testingeffect size
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.