Artificial Intelligence 7 min read

How to Build and Estimate a Logistic Regression Model for Grouped Data

This article explains the construction of logistic regression models, the use of the sigmoid function, maximum likelihood estimation, and least‑squares estimation for grouped data, illustrated with a housing‑purchase case study and complete Python code for fitting and predicting probabilities.

Model Perspective
Model Perspective
Model Perspective
How to Build and Estimate a Logistic Regression Model for Grouped Data

Revisiting Logistic Regression Model

Model Construction

Logistic regression differs from ordinary regression because its output is a discrete class label, typically 0 or 1 for binary classification. The idea is to construct a hyperplane that separates the feature space into two regions, assigning each region to one class.

Since the linear predictor yields a real‑valued output, it must be transformed to the interval (0,1). The step function is ideal but discontinuous at zero, so its smooth approximation, the sigmoid function, is used:

(sigmoid formula omitted for brevity)

The sigmoid compresses inputs to a probability between 0 and 1, allowing us to interpret the output as the probability of belonging to class 1.

Using the likelihood based on independent observations, the logistic model can be expressed as ... (equations omitted). Because the model cannot be fitted by ordinary least squares, maximum likelihood estimation is employed, which can be rewritten in a log‑likelihood form for easier optimization.

Parameter Estimation for the Logistic Model

To build the likelihood function, the model is rewritten in a unified form. Assuming independent samples, the joint probability equals the product of individual probabilities, leading to the likelihood expression.

Taking the logarithm yields the log‑likelihood, whose maximization provides the parameter estimates. An analytical solution is not available, so numerical methods such as gradient descent are used. For grouped data, a least‑squares estimate of the parameters can be derived.

Least‑Squares Estimation for Grouped Data

When multiple observations share the same predictor value, the data are considered grouped. Let the number of groups be G; within each group the proportion of successes estimates the probability for that group.

Using linear regression knowledge, the least‑squares estimate of the parameter is ... (formula omitted).

Case Study

Problem

At a housing exhibition, a number of customers signed a preliminary purchase intention. Over the next three months, only some actually bought a house. The outcome is coded as 1 for purchase and 0 otherwise. The household annual income (divided into nine groups) is the predictor. The developer wants to model the probability of purchase as a function of income.

Because the response is a Bernoulli variable, logistic regression is appropriate. The grouped observations allow estimation of the purchase probability for each income group, which is then log‑transformed for linear modeling.

Using the data, the linear regression equation is obtained, and the corresponding logistic regression equation is derived. The model shows that higher income leads to a higher probability of purchase. For a household income of 90,000, the predicted purchase probability is computed.

Code

<code>import numpy as np
import statsmodels.api as sm
a = np.loadtxt("data/house.txt")   # load 9 rows of x, ni, mi
x = a[:,0]
pi = a[:,2] / a[:,1]
X = sm.add_constant(x)
yi = np.log(pi/(1-pi))
md = sm.OLS(yi, X).fit()   # build and fit model
print(md.summary())       # output all results
b = md.params             # regression coefficients
p0 = 1/(1+np.exp(-np.dot(b, [1,9])))
print("所求概率p0=%.4f" % p0)
np.savetxt("data/house2", b)  # save coefficients
</code>

References

司守奎,孙玺菁 Python数学实验与建模

Pythonstatisticslogistic regressiongrouped datamaximum likelihood
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.