Fundamentals 10 min read

Unpacking Gender Wage Gaps: Oaxaca‑Blinder, Regression & Simulated Data

This article reviews Claudia Goldin’s Nobel‑winning research on gender wage disparities, explaining the Oaxaca‑Blinder decomposition, multiple linear regression, and mean‑difference models, and demonstrates their application with a synthetic dataset and Python code to illustrate how education, experience, and gender affect wages.

Model Perspective
Model Perspective
Model Perspective
Unpacking Gender Wage Gaps: Oaxaca‑Blinder, Regression & Simulated Data
Claudia Goldin is a renowned labor economist who received the 2023 Nobel Prize in Economic Sciences for her pioneering work identifying the drivers of gender differences in the labor market. Her research covers topics such as women’s career and family choices, co‑education, the impact of contraception on occupational and marital decisions, and the modern lifecycle of female employment. This article explores some of the mathematical models used in her key studies.

1. Exploring the Gender Wage Gap: Oaxaca‑Blinder Decomposition

In labor economics, the Oaxaca‑Blinder decomposition model is used to analyze the factors behind wage differentials. Goldin applies this model to investigate the wage gap between men and women, aiming to identify the contributing factors.

The Oaxaca‑Blinder decomposition separates the wage gap into two components: one attributed to differences in average characteristics (such as education and experience) between groups, and another attributed to differences in the wage structure (i.e., how characteristics affect wages).

The basic expression of the model is:

where the terms represent the average wages of men and women, their average characteristics (e.g., education, experience), the coefficients of the male and female wage equations, and the coefficients of a nondiscriminatory wage equation.

Using this model, we can decompose the sources of the wage gap into observable variables (like education and experience) and unobservable factors (potentially including gender discrimination).

2. Relationship Between Education and Wages: Multiple Linear Regression

Goldin’s research frequently employs multiple linear regression to examine how education influences wages.

Multiple linear regression analyzes how several independent variables affect a dependent variable. In labor economics, it is often used to assess the impact of education, experience, gender, and other factors on wages.

The basic model can be expressed as:

where the variables denote wage, education level, work experience, gender (typically coded as 1 for female, 0 for male), and the error term.

By estimating the parameters, Goldin quantifies the effects of education, experience, and gender on wages and investigates their interactions and trends.

3. Evolution of Gender Differences: Mean Difference Model

The mean difference model examines the difference in a variable (e.g., wage) between two groups while holding other conditions constant. It provides an intuitive way to analyze differences across gender, age groups, education levels, etc.

The mathematical expression is:

where the term represents the difference in the outcome variable under controlled conditions, and the subscripts denote the groups (e.g., gender).

This model allows Goldin to analyze the average gender differences in wages after controlling for other factors such as education and experience.

4. Synthetic Case and Model Application

A synthetic dataset (not real, for demonstration only) is created with the following variables:

Education: years of completed education (integer between 12 and 20).

Experience: years of work experience (integer between 1 and 40).

Gender: 0 for male, 1 for female.

Year: calendar year (between 2000 and 2021).

Wage: generated as a linear function of Education, Experience, and Gender with random noise.

The models are applied to this data: Oaxaca‑Blinder decomposition to explore the sources of the wage gap, multiple linear regression to study the relationship between education and wages, and the mean difference model to observe how the gender wage gap evolves over time.

4.1 Oaxaca‑Blinder Decomposition

<code>import statsmodels.api as sm

# Separate the data into male and female subsets
male_data = data[data['Gender'] == 0]
female_data = data[data['Gender'] == 1]

# Compute the average wage for male and female
average_wage_male = male_data['Wage'].mean()
average_wage_female = female_data['Wage'].mean()

# Compute wage equations for male and female using OLS (Ordinary Least Squares)
X_male = sm.add_constant(male_data[['Education', 'Experience']])  # Adding a constant for intercept
X_female = sm.add_constant(female_data[['Education', 'Experience']])  # Adding a constant for intercept

model_male = sm.OLS(male_data['Wage'], X_male).fit()
model_female = sm.OLS(female_data['Wage'], X_female).fit()

# Oaxaca-Blinder decomposition
# 1. Explained part: due to differences in endowments
explained_part = np.sum((male_data[['Education', 'Experience']].mean() - female_data[['Education', 'Experience']].mean()) * model_female.params[1:])

# 2. Unexplained part: due to differences in returns to endowments and the intercept
unexplained_part = np.sum((model_male.params - model_female.params) * female_data[['Education', 'Experience']].mean()) + model_male.params[0] - model_female.params[0]

# Results
(average_wage_male, average_wage_female), (explained_part, unexplained_part), explained_part + unexplained_part
</code>

Results are as follows:

Average wages – male: ..., female: ..., difference: ...

Oaxaca‑Blinder decomposition – explained part (due to average characteristic differences): ...; unexplained part (due to wage structure differences): ... The explained part is negative, indicating that if women received the same pay as men, based on women’s average education and experience, women’s wages would actually be slightly higher. The unexplained part is positive, suggesting that given women’s average education and experience, equalizing pay would increase their average wages. The sum of both parts approximates the original wage gap.

4.2 Multiple Linear Regression Analysis

<code># Multiple linear regression analysis
X = sm.add_constant(data[['Education', 'Experience', 'Gender']])  # Adding a constant for intercept
model_all = sm.OLS(data['Wage'], X).fit()

# Results
model_all.summary()
</code>

The regression results show that the coefficients for Education, Experience, and Gender are respectively ... and -5.26, all statistically significant. The R-squared value is 0.903, indicating that the model explains about 90.3% of the variance in wages. This implies that each additional year of education raises wages, each additional year of experience raises wages, and, holding other factors constant, women earn less than men on average.

Claudia Goldin’s research, through rigorous empirical analysis and sophisticated mathematical modeling, reveals the complex factors behind gender differences in the labor market and provides a framework for investigating these issues.

regressionsynthetic datagender wage gaplabor economicsOaxaca-Blinder
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.