Uncovering Road Freight Accident Causes with DoWhy & EconML: A Causal Inference Walkthrough

This article explains why causal inference is essential for decision‑making, contrasts it with pure prediction, outlines the four DoWhy steps (modeling, identification, estimation, refutation), and demonstrates a case study on road freight accidents using DoWhy and EconML with code examples and results.

G7 EasyFlow Tech Circle
G7 EasyFlow Tech Circle
G7 EasyFlow Tech Circle
Uncovering Road Freight Accident Causes with DoWhy & EconML: A Causal Inference Walkthrough

1. Why Causal Inference?

Decision‑making often requires answering "what‑if" questions, which demand understanding the cause of events and how actions can change future outcomes.

1.1 Causal Effect Definition

To measure the effect of action A on outcome Y, compare two worlds: the real world where A is taken and the counterfactual world where it is not, keeping everything else unchanged. The difference in Y between these worlds is the causal effect.

The formal expression is do(A) representing an intervention.

1.2 Prediction vs. Causal Inference

Prediction assumes training and test data share the same distribution and optimizes a loss function, while causal inference seeks the underlying data‑generating model because the distributions differ.

1.3 Two Core Challenges

We cannot observe the counterfactual world directly, so we must estimate it.

Multiple causal mechanisms can be entangled in a single data distribution, requiring domain knowledge and assumptions.

2. The Four Steps of Causal Inference

2.1 Modeling

Encode domain knowledge into a causal graph, identifying confounders and instrumental variables.

2.2 Identification

Check whether the causal effect can be identified from observed variables using the back‑door criterion or instrumental variables.

2.3 Estimation

Apply statistical estimators. DoWhy provides standard algorithms; EconML adds machine‑learning‑based estimators.

estimate = model.estimate_effect(
    identified_estimand,
    method_name="backdoor.propensity_score_stratification",
    target_units="ate")
print(estimate)
"""
*** Causal Estimate ***
Estimand type: nonparametric-ate
Mean value: 0.005777346467839227
"""

2.4 Refutation

Validate the estimate with robustness tests such as random common cause, placebo treatment, and data‑subset refuters.

# Random common cause
refute1_results = model.refute_estimate(
    identified_estimand, estimate, method_name="random_common_cause")
print(refute1_results)
"""
Estimated effect:0.005777346467839227
New effect:0.002310689963420354
p value:0.13
"""

# Placebo treatment
refute2_results = model.refute_estimate(
    identified_estimand, estimate, method_name="placebo_treatment_refuter")
print(refute2_results)
"""
Estimated effect:0.005777346467839227
New effect:6.337625040244461e-05
p value:0.5
"""

3. Case Study: Causal Story Behind Road Freight Accidents

Data: 10,660 vehicles from a province in 2021, with features such as highway mileage ratio, night‑time driving ratio, curvature, industry tag, etc. Target y indicates whether an accident occurred.

Key hypotheses:

Reducing night‑time driving lowers accident risk.

Moderate highway driving reduces risk, while always high or always low speed increases risk.

3.1 Feature Engineering

df['less_night_driving'] = df['night_time_ratio'] <= 0.1
df['middle_highway_driving'] = (df['highway_mileage_ratio'] >= 0.2) & (df['highway_mileage_ratio'] <= 0.7)

3.2 Causal Modeling

Assumptions encoded in the graph include:

Provincial road mileage influences accidents and highway mileage.

High curvature trips affect both highway mileage and accidents.

Industry tag affects highway usage and accident likelihood.

3.3 Estimation Results

The estimated average treatment effect of moderate highway driving is about 0.006, indicating a slight positive impact (i.e., it helps reduce accident probability).

4. Summary

The article introduced causal inference concepts, the four‑step DoWhy workflow, and applied them to a real‑world road freight accident dataset. It emphasized that the first step—encoding domain knowledge into a causal graph—is the most critical and challenging, as data alone cannot provide causality.

5. References

DoWhy: https://github.com/microsoft/dowhy

EconML: https://github.com/microsoft/EconML

Books on causal inference and machine learning: http://causalinference.gitlab.io/

Judea Pearl’s causal analysis blog: http://causality.cs.ucla.edu/blog/

machine learningcausal inferenceDoWhyEconMLroad freight accidents
G7 EasyFlow Tech Circle
Written by

G7 EasyFlow Tech Circle

Official G7 EasyFlow tech channel! All the hardcore tech, cutting‑edge innovations, and practical sharing you want are right here.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.