Fundamentals 10 min read

Mastering One‑Way and Multi‑Factor ANOVA in R: From Theory to Code

This article explains the principles of one‑way, two‑way and multi‑factor ANOVA, demonstrates how to perform them in R using aov(), lm() and anova(), and shows post‑hoc comparisons with TukeyHSD as well as correlation testing with cor() and cor.test().

Model Perspective
Model Perspective
Model Perspective
Mastering One‑Way and Multi‑Factor ANOVA in R: From Theory to Code

One‑Way ANOVA

When comparing the means of several groups, the null hypothesis states that all groups come from the same population with equal means, while the alternative hypothesis claims that at least one mean differs. One‑way ANOVA tests whether the between‑group variance is large enough to reject the null.

The linear model for a one‑factor design is Y = μ + τ_i + ε_{ij} , where μ is the overall mean, τ_i is the effect of the i‑th level, and ε_{ij} are independent normal errors. The null hypothesis can be expressed as all τ_i = 0.

In R the function aov() fits this model, internally calling lm() . The basic syntax is aov(formula, data = NULL, projections = FALSE, qr = TRUE, contrasts = NULL, ...) , where formula specifies the model and data provides the dataset.

Example: generate four groups of random numbers with different means using rnorm() and compare their means.
<code>data <- round(c(rnorm(5), rnorm(5,2), rnorm(5,3), rnorm(5,1)),2)
V1 <- data.frame(data, FA=factor(c(rep(1,5),rep(2,5),rep(3,5),rep(4,5))))
V1.aov <- aov(data~FA, data=V1)
summary(V1.aov)
</code>

The output shows a significant F‑value, indicating that the group means are not all equal. Post‑hoc analysis with TukeyHSD(V1.aov) provides confidence intervals for each pairwise comparison, revealing which specific groups differ.

Two‑Factor and Multi‑Factor ANOVA

When more than one factor is considered, interaction effects may exist. The total between‑group sum of squares includes main effects of each factor and their interactions. The same principle as one‑way ANOVA applies, but the model becomes more complex.

Example: assess whether newborn weight is related to mother’s age, race, smoking status, hypertension history, etc.

Using the birthwt dataset from the MASS package (189 observations, 10 variables), a multi‑factor ANOVA can be performed:

<code>library(MASS)
data(birthwt)
birthwt.anova <- aov(bwt ~ race*smoke + ht + ptl*ui, data=birthwt)
summary(birthwt.anova)
</code>

The summary table lists degrees of freedom, sum of squares, mean squares, F values and p‑values for each main effect and interaction, indicating which factors significantly affect birth weight.

R also provides functions for correlation analysis. cor() computes correlation coefficients with options for handling missing data and choosing the method (Pearson, Kendall, Spearman). cor.test() performs hypothesis testing on the correlation, giving confidence intervals and p‑values.

Example: generate two correlated normal variables and compute their Pearson correlation.
<code>x <- rnorm(20,4,1)
y <- 2*x + rnorm(20)
cor(x,y)
</code>

The result (≈0.95) confirms a strong positive linear relationship between the variables.

correlationstatistical analysisRlinear modelsANOVAmultiple comparisonTukey test
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.