How to Perform One-Way ANOVA in Python: Theory, Example, and Code
This article explains the concept of one‑factor (one‑way) ANOVA, walks through a lamp‑life example with four manufacturing processes, derives the within‑ and between‑group sum‑of‑squares formulas, and shows how to execute the test in Python using statsmodels.
One‑Way ANOVA
In a single‑factor experiment only factor A varies while all other factors remain constant; the aim is to determine whether factor A has a significant effect on the response by testing equality of the means across its levels.
Example
Four manufacturing processes (A1–A4) produce light bulbs; lifetimes of samples from each process are measured (data shown). The question is whether the processes lead to significantly different lifetimes.
Each process corresponds to a level of the factor. Assuming normality and independence, the problem reduces to testing equality of four population means.
The total sum of squares is decomposed into within‑group and between‑group sums of squares. The within‑group sum reflects random variation within each level, while the between‑group sum reflects variation due to different factor levels. ANOVA compares these two quantities.
Theorem 1 provides the decomposition of the total sum of squares into the within‑group (error) sum of squares and the between‑group (treatment) sum of squares.
Theorem 2 shows that, under the null hypothesis of equal means, the ratio of the between‑group mean square to the within‑group mean square follows an F distribution, which is used as the test statistic.
Based on Theorem 2, the test statistic F is constructed. At a chosen significance level α, if the computed p‑value is less than α, the null hypothesis of equal means is rejected, indicating a significant effect of the factor.
In practice, the ANOVA can be performed with Python’s statsmodels library:
<code>import numpy as np
import statsmodels.api as sm
y = np.array([1620, 1670, 1700, 1750, 1800,
1580, 1600, 1640, 1720,
1460, 1540, 1620, 1680, 1500,
1550, 1610])
x = np.hstack([np.ones(5), np.full(4, 2), np.full(4, 3), np.full(3, 4)])
d = {'x': x, 'y': y} # construct dictionary
model = sm.formula.ols("y~C(x)", d).fit() # build model
anovat = sm.stats.anova_lm(model) # one‑way ANOVA
print(anovat)
</code>The output shows an F statistic of 3.73 and a p‑value of 0.042; therefore, at the 5 % significance level the null hypothesis is rejected and the manufacturing process has a significant impact on bulb lifetime.
The ANOVA summary table includes degrees of freedom, sum of squares, mean squares, the F value, and the p‑value (PR(>F)).
Reference: 司守奎,孙玺菁 Python数学实验与建模
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.