Mastering One‑Way and Multi‑Factor ANOVA in R: From Theory to Code
This article explains the principles of one‑way, two‑way and multi‑factor ANOVA, demonstrates how to perform them in R using aov(), lm() and anova(), and shows post‑hoc comparisons with TukeyHSD as well as correlation testing with cor() and cor.test().
One‑Way ANOVA
When comparing the means of several groups, the null hypothesis states that all groups come from the same population with equal means, while the alternative hypothesis claims that at least one mean differs. One‑way ANOVA tests whether the between‑group variance is large enough to reject the null.
The linear model for a one‑factor design is Y = μ + τ_i + ε_{ij} , where μ is the overall mean, τ_i is the effect of the i‑th level, and ε_{ij} are independent normal errors. The null hypothesis can be expressed as all τ_i = 0.
In R the function aov() fits this model, internally calling lm() . The basic syntax is aov(formula, data = NULL, projections = FALSE, qr = TRUE, contrasts = NULL, ...) , where formula specifies the model and data provides the dataset.
Example: generate four groups of random numbers with different means using rnorm() and compare their means.
<code>data <- round(c(rnorm(5), rnorm(5,2), rnorm(5,3), rnorm(5,1)),2)
V1 <- data.frame(data, FA=factor(c(rep(1,5),rep(2,5),rep(3,5),rep(4,5))))
V1.aov <- aov(data~FA, data=V1)
summary(V1.aov)
</code>The output shows a significant F‑value, indicating that the group means are not all equal. Post‑hoc analysis with TukeyHSD(V1.aov) provides confidence intervals for each pairwise comparison, revealing which specific groups differ.
Two‑Factor and Multi‑Factor ANOVA
When more than one factor is considered, interaction effects may exist. The total between‑group sum of squares includes main effects of each factor and their interactions. The same principle as one‑way ANOVA applies, but the model becomes more complex.
Example: assess whether newborn weight is related to mother’s age, race, smoking status, hypertension history, etc.
Using the birthwt dataset from the MASS package (189 observations, 10 variables), a multi‑factor ANOVA can be performed:
<code>library(MASS)
data(birthwt)
birthwt.anova <- aov(bwt ~ race*smoke + ht + ptl*ui, data=birthwt)
summary(birthwt.anova)
</code>The summary table lists degrees of freedom, sum of squares, mean squares, F values and p‑values for each main effect and interaction, indicating which factors significantly affect birth weight.
R also provides functions for correlation analysis. cor() computes correlation coefficients with options for handling missing data and choosing the method (Pearson, Kendall, Spearman). cor.test() performs hypothesis testing on the correlation, giving confidence intervals and p‑values.
Example: generate two correlated normal variables and compute their Pearson correlation.
<code>x <- rnorm(20,4,1)
y <- 2*x + rnorm(20)
cor(x,y)
</code>The result (≈0.95) confirms a strong positive linear relationship between the variables.
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.