Fundamentals 10 min read

Why OLS Fails for Discrete Choices: Understanding Probit and Logit Models

Discrete choice models, including binary and count data, require specialized estimation techniques such as Probit and Logit rather than OLS, and this article explains their foundations, link functions, marginal effects, goodness‑of‑fit measures, heteroskedasticity issues, and practical interpretation for policy analysis.

Model Perspective

Jul 31, 2022

Why OLS Fails for Discrete Choices: Understanding Probit and Logit Models

Examples of Discrete Dependent Variables

Binary choices: graduate school or not; employment or unemployment; buying a house or not; buying insurance or not; loan approval or rejection; going abroad or staying; returning home or not; war or peace; life or death. Multiple choices: transportation mode (walk, bike, car), occupation selection.

These models are called “discrete choice models” or “qualitative response models.” Sometimes the dependent variable can only take non‑negative integers: number of patents a firm obtains in a period; number of hospital visits by a person; number of mining accidents in a province per year. Such data are “count data,” also discrete. Because of the discrete nature, ordinary least squares (OLS) regression is generally inappropriate.

Assume an individual has two alternatives, e.g., (graduate school) or (no graduate school). All explanatory variables are collected in a vector X. The Linear Probability Model (LPM) has the advantage of computational simplicity and easy marginal effects, but suffers from several drawbacks:

(1) Endogeneity leads to inconsistent estimates.

(2) Errors follow a two‑point distribution, not normal.

(3) Heteroskedasticity: error variance depends on X, requiring robust standard errors.

(4) Predicted probabilities can fall outside the [0,1] range.

The function linking the linear predictor to the probability is called the “link function.” By choosing an appropriate cumulative distribution function (e.g., the standard normal CDF), the model becomes a Probit; using the logistic CDF yields a Logit. The logistic distribution has a symmetric density about zero, mean zero, variance larger than that of the standard normal, and heavy tails.

Because the logistic CDF has a closed‑form expression (unlike the normal CDF), Logit is computationally more convenient. Estimation is performed by maximum likelihood (MLE). For the Logit model, the probability density for observation i can be written compactly, and taking logs gives the log‑likelihood. Assuming independent observations, the sample log‑likelihood is the sum of individual log‑likelihoods.

In nonlinear models, estimated coefficients are not marginal effects. For Probit, marginal effects must be computed separately. Common marginal‑effect concepts include:

Average marginal effect (average over all observations).

Marginal effect at the mean of the sample.

Marginal effect at a representative value of a covariate.

For the Logit model, the odds ratio (or relative risk) is defined as exp(β). An odds ratio of 2 means the odds of the outcome are twice as large when the explanatory variable increases by one unit.

Goodness‑of‑fit for binary models cannot use the usual sum‑of‑squares decomposition, so R² is unavailable. Stata reports a “pseudo‑R²” proposed by McFadden (1974):

pseudo‑R² = 1 – (log‑likelihood of the fitted model / log‑likelihood of the intercept‑only model).

Another fit measure is the percent correctly predicted: if the predicted probability exceeds a threshold (e.g., 0.5), the observation is classified as a positive outcome; otherwise as negative. Comparing predicted classifications with actual outcomes yields the correct‑prediction percentage.

If the distributional assumption is wrong, the estimator becomes a quasi‑maximum‑likelihood estimator (QMLE). Under correct specification, MLE is consistent; with heteroskedasticity, robust standard errors are advisable. When data are not i.i.d. (e.g., clustered samples), cluster‑robust standard errors should be used.

Microfoundations of Binary Choice Models

Binary choice can be modeled via a latent variable representing net benefit (benefit minus cost). If net benefit > 0, the choice is made; otherwise, it is not.

The latent net benefit is unobservable. The decision rule is: choose the alternative if the latent variable exceeds zero.

Assuming the error term follows a logistic distribution yields the Logit model; assuming a standard normal error yields the Probit model.

Another microfoundation is the Random Utility Maximization (RUM) model. An individual derives utility U₁ from alternative 1 and U₂ from alternative 2. The individual chooses the alternative with the higher utility. Utility consists of a deterministic component plus a random disturbance, leading to the stochastic utility framework.

Heteroskedasticity in Binary Choice Models

The standard Probit or Logit model assumes homoskedastic errors. A likelihood‑ratio test can assess the homoskedasticity hypothesis. Under heteroskedasticity, the variance of the error term is modeled as a function of exogenous variables, and the log‑likelihood is adjusted accordingly, allowing simultaneous estimation of the mean and variance equations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

binary-model discrete-choice logit marginal-effects probit

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.