Master Logistic Regression: Binary, Multiclass, and Ordered Extensions with Python
This article explains logistic regression and its extensions—binary, multiclass (softmax), and ordered logistic regression—covering mathematical foundations, optimization objectives, real‑world applications, and Python implementations using scikit‑learn with code examples and visual illustrations.
Classification is one of the most common tasks in machine learning and statistics. Whether predicting disease presence, loan default, or ad clicks, classification methods are crucial, and logistic regression remains a classic, widely used technique despite its name containing “regression.” This article explores logistic regression and its two extensions: multiclass logistic regression and ordered logistic regression, which enable handling more complex classification problems.
1 Binary Logistic Regression
Logistic regression is a statistical method widely used for binary classification tasks.
1.1 Mathematical Model
Linear function
First, we create a linear function to describe the relationship between features and the target variable, similar to linear regression:
where the coefficients are the model parameters and the inputs are the features.
Logistic (sigmoid) function
We then transform the linear output into a probability between 0 and 1 using the sigmoid function, allowing a continuous output to become a classification output:
where e is the base of natural logarithms. As the linear value approaches positive infinity, the sigmoid approaches 1; as it approaches negative infinity, it approaches 0.
Decision boundary
To classify, we set a decision threshold (e.g., 0.5). If the predicted probability exceeds this value, the observation is assigned to class 1; otherwise, to class 0.
1.2 Optimization Objective
The goal of logistic regression is to maximize the likelihood function (or equivalently minimize the log‑loss). The model’s likelihood for a dataset is:
For a single sample, the likelihood contribution is the predicted probability; the overall likelihood is the product over all samples, and we usually work with the log‑likelihood for mathematical convenience.
Maximizing the log‑likelihood (or minimizing negative log‑likelihood) is the objective.
Parameters are typically estimated using gradient descent or other optimization algorithms.
1.3 Applications
Logistic regression is applied in many domains, including:
Medicine – predicting whether a patient has a disease (yes/no).
Finance – predicting whether a loan applicant will default.
Marketing – predicting whether a customer will purchase a product.
Social media – predicting whether a user will click an ad or link.
2 Multiclass Logistic Regression
Multiclass logistic regression (also called Softmax regression) extends logistic regression to handle more than two classes. While binary logistic regression uses the sigmoid function, multiclass problems use the softmax function to produce a probability distribution over all classes.
2.1 Mathematical Model
Assume there are K classes. For a sample with feature vector x, the model learns a weight matrix W and bias vector b, where each class k has its own weight vector w_k and bias b_k. The unnormalized scores (logits) for class k are:
Applying the softmax function converts these scores into class probabilities:
where y denotes the true class label.
2.2 Optimization Objective
As with binary logistic regression, the objective is to minimize the cross‑entropy loss (negative log‑likelihood). For a single sample the loss is:
For the whole dataset the average loss is minimized:
2.3 Applications
Multiclass logistic regression can be used for various multi‑category tasks, such as:
Handwritten digit recognition – classifying digits 0‑9.
News article categorization – labeling articles as politics, sports, entertainment, etc.
Image classification – assigning images to predefined categories.
3 Ordered Logistic Regression
Ordered logistic regression (also known as ordered multinomial logistic or ordered probit) extends logistic regression to handle ordered categorical variables, where the categories have a natural ranking. For example, rating a product as poor, fair, good, or excellent is an ordered classification task. Unlike standard multiclass logistic regression, ordered logistic regression incorporates the order relationship between categories into the prediction.
3.1 Mathematical Model
Assume there are J ordered categories. The model defines a series of thresholds θ_j (θ_1 < θ_2 < … < θ_{J‑1}) and a linear function η = Xβ. The probability of belonging to category j is given by the difference of two sigmoid functions evaluated at the thresholds:
where σ is the sigmoid function.
3.2 Optimization Objective
As with standard logistic regression, the goal is to minimize the negative log‑likelihood loss.
3.3 Applications
Ordered logistic regression is useful in scenarios such as:
Product rating – predicting ordered star ratings.
Health assessment – classifying health status as poor, moderate, or good.
Economic assessment – predicting income level as low, medium, or high.
4 Python Implementation of Multiclass and Ordered Regression
We demonstrate the concepts using scikit‑learn datasets. For multiclass logistic regression we use the digits dataset (handwritten digit images 0‑9). For ordered logistic regression we use the wine dataset, where the target is wine quality, an ordered classification problem.
4.1 Handwritten Digit Recognition (Multiclass)
Steps: load data, split into training and test sets, train a multinomial logistic regression model, evaluate performance.
<code>from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
# 1. Load data
digits = datasets.load_digits()
# 2. Split data
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.2, random_state=823)
# 3. Train multinomial logistic regression
logistic_model = LogisticRegression(max_iter=10000, multi_class="multinomial")
logistic_model.fit(X_train, y_train)
# 4. Evaluate
y_pred = logistic_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)
accuracy, classification_rep
</code>The model achieves 96% accuracy on the test set, with high precision, recall, and F1 scores for each digit.
4.2 Wine Quality Rating (Ordered)
scikit‑learn does not provide a direct ordered logistic regression implementation, but we can simulate it by training a binary logistic regression model for each threshold and combining their predictions.
<code># 1. Load data
wine = datasets.load_wine()
# 2. Data already has three ordered classes
# 3. Split data
X_train_wine, X_test_wine, y_train_wine, y_test_wine = train_test_split(wine.data, wine.target, test_size=0.2, random_state=823)
# 4. Train binary models for each threshold
models = []
thresholds = [0, 1] # thresholds for ordered categories
for thresh in thresholds:
y_train_binary = (y_train_wine > thresh).astype(int)
model = LogisticRegression(max_iter=10000)
model.fit(X_train_wine, y_train_binary)
models.append(model)
# 5. Predict ordered categories
def predict_ordered(models, X):
probs = [model.predict_proba(X)[:, 1] for model in models]
predictions = []
for prob_tuple in zip(*probs):
if prob_tuple[0] <= 0.5:
predictions.append(0)
elif prob_tuple[1] <= 0.5:
predictions.append(1)
else:
predictions.append(2)
return predictions
y_pred_wine = predict_ordered(models, X_test_wine)
accuracy_wine = accuracy_score(y_test_wine, y_pred_wine)
classification_rep_wine = classification_report(y_test_wine, y_pred_wine)
accuracy_wine, classification_rep_wine
</code>The ordered logistic regression approach achieves over 88% accuracy on the test set, demonstrating good performance with room for further improvement.
Conclusion
Logistic regression, despite its simplicity, remains popular due to its strong classification ability and intuitive interpretability. When problems extend beyond binary classification to multiclass or ordered categories, multiclass and ordered logistic regression provide effective tools. Selecting the appropriate model and optimization method still depends on the specific data and business requirements. This article aims to deepen readers' understanding of these techniques and help them apply the methods in their own projects.
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.