Fundamentals 13 min read

Propensity Score Matching: Principles, Implementation, and Evaluation

The article explains Propensity Score Matching as a causal inference method, detailing treatment effect concepts, required assumptions, score estimation, various matching algorithms, SQL implementation, quality assessment metrics, and how to estimate ATT using Difference-in-Differences, while outlining workflow steps, trade-offs, and alternatives.

DaTaobao Tech
DaTaobao Tech
DaTaobao Tech
Propensity Score Matching: Principles, Implementation, and Evaluation

This article introduces Propensity Score Matching (PSM), a causal inference technique that estimates treatment effects from observational data. It first explains the concept of Treatment Effect and Average Treatment Effect on the Treated (ATT), then presents the two key assumptions required for valid PSM: Conditional Independence (CIA) and Common Support.

The propensity score is defined as the probability of receiving treatment given observed covariates. Estimating it is a binary classification problem; any standard machine‑learning model can be used. Feature selection should include variables that affect both treatment assignment and outcomes while excluding variables that are affected by the treatment.

Several matching algorithms are described:

Nearest‑Neighbour Matching (with or without replacement, single or multiple neighbours).

Caliper and Radius Matching (limit score distance).

Stratification/Interval Matching (group scores into bins).

A practical SQL implementation is provided to perform matching:

with matching_detail as (
    select t1.user_id as treatment_userid,
      t1.score as treatment_pscore,
      t2.user_id as control_userid,
      t2.score as control_pscore,
      row_number() over (partition by t1.user_id order by abs(t1.score-t2.score) asc) as rn
    from propensity_score_treatment t1
    left join propensity_score_control t2
      -- stratified matching
        on t1.gender = t2.gender and round(t1.score, 1)*10 =  round(t2.score, 1)*10
    where abs(t1.score-t2.score) <= 0.05 -- caliper matching
)
select * from matching_detail where rn = 1  # rn > 1 indicates multi‑neighbour/radius matching

After matching, quality is assessed using standardized bias (SB) and two‑sample t‑tests; SB below 5 % is generally acceptable. Additional checks include QQ‑plots, variance ratios, and joint F‑tests.

When the assumptions hold, the ATT can be estimated with a Difference‑in‑Differences (DID) approach, yielding the incremental impact of the intervention. The article also discusses the distinction between ATT and ATE, the bias‑variance trade‑offs of different matching choices, and the importance of sensitivity analysis to gauge robustness.

Finally, the article summarizes the end‑to‑end workflow: (1) select covariates, (2) estimate propensity scores, (3) perform matching, (4) validate matching quality, (5) verify parallel trends, and (6) compute incremental effects with DID. Advantages, limitations, and alternative methods such as inverse‑probability weighting are also mentioned.

SQLcausal inferencepropensity score matchingmatching algorithmstreatment effect
DaTaobao Tech
Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.