Uncovering Road Freight Accident Causes with DoWhy & EconML: A Causal Inference Walkthrough
This article explains why causal inference is essential for decision‑making, contrasts it with pure prediction, outlines the four DoWhy steps (modeling, identification, estimation, refutation), and demonstrates a case study on road freight accidents using DoWhy and EconML with code examples and results.
1. Why Causal Inference?
Decision‑making often requires answering "what‑if" questions, which demand understanding the cause of events and how actions can change future outcomes.
1.1 Causal Effect Definition
To measure the effect of action A on outcome Y, compare two worlds: the real world where A is taken and the counterfactual world where it is not, keeping everything else unchanged. The difference in Y between these worlds is the causal effect.
The formal expression is do(A) representing an intervention.
1.2 Prediction vs. Causal Inference
Prediction assumes training and test data share the same distribution and optimizes a loss function, while causal inference seeks the underlying data‑generating model because the distributions differ.
1.3 Two Core Challenges
We cannot observe the counterfactual world directly, so we must estimate it.
Multiple causal mechanisms can be entangled in a single data distribution, requiring domain knowledge and assumptions.
2. The Four Steps of Causal Inference
2.1 Modeling
Encode domain knowledge into a causal graph, identifying confounders and instrumental variables.
2.2 Identification
Check whether the causal effect can be identified from observed variables using the back‑door criterion or instrumental variables.
2.3 Estimation
Apply statistical estimators. DoWhy provides standard algorithms; EconML adds machine‑learning‑based estimators.
estimate = model.estimate_effect(
identified_estimand,
method_name="backdoor.propensity_score_stratification",
target_units="ate")
print(estimate)
"""
*** Causal Estimate ***
Estimand type: nonparametric-ate
Mean value: 0.005777346467839227
"""2.4 Refutation
Validate the estimate with robustness tests such as random common cause, placebo treatment, and data‑subset refuters.
# Random common cause
refute1_results = model.refute_estimate(
identified_estimand, estimate, method_name="random_common_cause")
print(refute1_results)
"""
Estimated effect:0.005777346467839227
New effect:0.002310689963420354
p value:0.13
"""
# Placebo treatment
refute2_results = model.refute_estimate(
identified_estimand, estimate, method_name="placebo_treatment_refuter")
print(refute2_results)
"""
Estimated effect:0.005777346467839227
New effect:6.337625040244461e-05
p value:0.5
"""3. Case Study: Causal Story Behind Road Freight Accidents
Data: 10,660 vehicles from a province in 2021, with features such as highway mileage ratio, night‑time driving ratio, curvature, industry tag, etc. Target y indicates whether an accident occurred.
Key hypotheses:
Reducing night‑time driving lowers accident risk.
Moderate highway driving reduces risk, while always high or always low speed increases risk.
3.1 Feature Engineering
df['less_night_driving'] = df['night_time_ratio'] <= 0.1
df['middle_highway_driving'] = (df['highway_mileage_ratio'] >= 0.2) & (df['highway_mileage_ratio'] <= 0.7)3.2 Causal Modeling
Assumptions encoded in the graph include:
Provincial road mileage influences accidents and highway mileage.
High curvature trips affect both highway mileage and accidents.
Industry tag affects highway usage and accident likelihood.
3.3 Estimation Results
The estimated average treatment effect of moderate highway driving is about 0.006, indicating a slight positive impact (i.e., it helps reduce accident probability).
4. Summary
The article introduced causal inference concepts, the four‑step DoWhy workflow, and applied them to a real‑world road freight accident dataset. It emphasized that the first step—encoding domain knowledge into a causal graph—is the most critical and challenging, as data alone cannot provide causality.
5. References
DoWhy: https://github.com/microsoft/dowhy
EconML: https://github.com/microsoft/EconML
Books on causal inference and machine learning: http://causalinference.gitlab.io/
Judea Pearl’s causal analysis blog: http://causality.cs.ucla.edu/blog/
G7 EasyFlow Tech Circle
Official G7 EasyFlow tech channel! All the hardcore tech, cutting‑edge innovations, and practical sharing you want are right here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
