How Logistic Regression Predicts Titanic Survival: A Step-by-Step R Guide
This article explains logistic regression for binary outcomes, demonstrates its implementation in R with the TitanicSurvival dataset, and interprets the model coefficients showing how gender, age, and passenger class significantly affect survival probability.
When the dependent variable is binary (“yes/no”), logistic regression (also called logit regression) can model the linear relationship between the probability of “yes” and the independent variables.
In this model, is the probability of “yes”, is the probability of “no”, and the relationship is expressed by the sigmoid function. The generalized linear model function glm() in R implements logistic regression and can fit various regression models.
The famous movie Titanic illustrates human choices before disaster, prioritizing women and children. The example uses real 1912 Titanic data with 11,309 passengers, including age, sex, passenger class, and survival status, sourced from the TitanicSurvival dataset in the carData package.
<code>library(carData)
logit.TS <- glm(survived ~ sex + age + factor(passengerClass), family = binomial, data = TitanicSurvival)
summary(logit.TS)
</code>Result
<code>Call:
glm(formula = survived ~ sex + age + factor(passengerClass),
family = binomial, data = TitanicSurvival)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.6399 -0.6979 -0.4336 0.6688 2.3964
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.522074 0.326702 10.781 < 2e-16 ***
sexmale -2.497845 0.166037 -15.044 < 2e-16 ***
age -0.034393 0.006331 -5.433 5.56e-08 ***
factor(passengerClass)2nd -1.280570 0.225538 -5.678 1.36e-08 ***
factor(passengerClass)3rd -2.289661 0.225802 -10.140 < 2e-16 ***
---
Signif. codes: ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1414.62 on 1045 degrees of freedom
Residual deviance: 982.45 on 1041 degrees of freedom
(263 observations deleted due to missingness)
AIC: 992.45
Number of Fisher Scoring iterations: 4
</code>In this example, the generalized linear model formula is survived ~ sex + age + factor(passengerClass) , investigating the relationship between survival status and gender, age, and passenger class. Both survival and sex are binary variables, age ranges from 0 to 80, and class has three levels encoded as dummy variables. Using the binomial family yields a logit model appropriate for the binary outcome. After removing 263 rows with missing data, the results show that survival is highly significantly associated with gender, age, and class, reflecting a profound aspect of human behavior.
Source: Liu Hongde, Sun Xiao, Xie Jianming, Biological Data Analysis and Practice
Liu Hongde, Sun Xiao, Xie Jianming
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.