Artificial Intelligence 21 min read

Which Python Causal Inference Library Wins? A Deep 5‑Minute Comparison

An in‑depth, five‑minute guide compares six popular Python causal inference libraries—Bnlearn, Pgmpy, CausalNex, DoWhy, PyAgrum, and CausalImpact—using the Census Income dataset to illustrate structure learning, parameter estimation, inference, and causal effect validation, highlighting each tool’s strengths, limitations, and ideal use cases.

Data Party THU

Nov 18, 2025

Which Python Causal Inference Library Wins? A Deep 5‑Minute Comparison

Bayesian Causal Model

In causal inference, variables are roughly divided into driver variables that directly affect the outcome and passenger variables that are associated with the outcome but do not cause it. Distinguishing these two types is crucial for any causal analysis, such as identifying the true drivers of equipment failures in predictive maintenance.

Compared to pure predictive models, causal inference answers "why" rather than "how much," enabling explanation of model behavior and effective interventions.

Dataset and Experimental Design

All experiments use the same Census Income dataset, which contains 48,842 records and 14 mostly categorical features. The target question is whether having a postgraduate degree significantly increases the probability of earning more than $50K.

Data loading and preprocessing (removing continuous and sensitive features) are performed as follows:

# Install
pip install datazets

# Import libraries
import datazets as dz
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load and clean data
df = dz.import_example(data='census_income')
drop_cols = ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week', 'race', 'sex']
df.drop(labels=drop_cols, axis=1, inplace=True)
print(df.head())

1. Bnlearn

Bnlearn provides a high‑level API that integrates the full causal analysis pipeline: structure learning, parameter learning, inference, and synthetic data generation. It supports discrete, continuous, and mixed data types.

Structure learning with Hill‑Climb Search and BIC scoring:

# Install
pip install bnlearn

# Load library
import bnlearn as bn

# Structure learning
model = bn.structure_learning.fit(df, methodtype='hillclimbsearch', scoretype='bic')
model = bn.independence_test(model, df, test="chi_square", alpha=0.05, prune=True)
model = bn.parameter_learning.fit(model, df)

# Plot
G = bn.plot(model, interactive=False)
bn.plot_graphviz(model).view(filename=r'c:/temp/bnlearn_plot')

The learned DAG for the Census Income data is shown below:

Inference example (probability of high income for doctorate holders):

query = bn.inference.fit(model, variables=['salary'], evidence={'education':'Doctorate'})
print(query)

Result: ≤50K = 29.1 %, >50K = 70.9 % – confirming that a doctorate substantially raises income probability.

2. Pgmpy

Pgmpy is a lower‑level toolbox that requires the user to manually handle data processing, model construction, parameter estimation, inference, and visualization.

# Install
pip install pgmpy

# Import functions
from pgmpy.estimators import HillClimbSearch, BicScore, BayesianEstimator
from pgmpy.models import BayesianNetwork
from pgmpy.inference import VariableElimination

# Load data and drop columns
df = dz.import_example(data='census_income')
drop_cols = ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week', 'race', 'sex']
df.drop(labels=drop_cols, axis=1, inplace=True)

# Structure learning
est = HillClimbSearch(df)
scoring = BicScore(df)
model = est.estimate(scoring_method=scoring)
print(model.edges())

# Parameter learning
model.fit(df, estimator=BayesianEstimator, prior_type='bdeu', equivalent_sample_size=1000)

# Inference
infer = VariableElimination(model)
query = infer.query(variables=['salary'], evidence={'education':'Doctorate'})
print(query)

Results are consistent with Bnlearn but require more manual steps, making Pgmpy suitable for researchers who need full control.

3. CausalNex

CausalNex focuses on learning causal graphs from data and quantifying causal effects, but it only supports discrete distributions. Continuous variables must be discretized before modeling.

# Install
pip install causalnex

# Import libraries
from causalnex.structure.notears import from_pandas
from causalnex.network import BayesianNetwork
import datazets as dz
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
import networkx as nx

# Load and clean data
df = dz.get(data='census_income')
drop_cols = ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week', 'race', 'sex']
df.drop(labels=drop_cols, axis=1, inplace=True)

# Encode categorical variables
le = LabelEncoder()
for col in df.columns:
    df[col] = le.fit_transform(df[col])

# Structure learning with NOTEARS
sm = from_pandas(df)
sm.remove_edges_below_threshold(0.8)

# Plot
plt.figure(figsize=(15,10))
edge_width = [d['weight']*0.3 for (u,v,d) in sm.edges(data=True)]
nx.draw_networkx(sm, node_size=400, arrowsize=20, alpha=0.6, edge_color='b', width=edge_width)
plt.show()

# Fit Bayesian network
bn = BayesianNetwork(sm)
bn = bn.fit_node_states(df)
bn = bn.fit_cpds(df, method="BayesianEstimator", bayes_prior="K2")
print(bn.cpds["education"])

The resulting graph (edges below the threshold are omitted) is shown below:

CausalNex offers good interpretability but requires careful preprocessing and is limited to Python 3.6–3.10.

4. DoWhy

DoWhy is a causal validation framework. Instead of learning a graph, it requires the user to explicitly define treatment, outcome, and common causes, then it tests the causal assumptions.

# Install
pip install dowhy

# Import libraries
from dowhy import CausalModel
import datazets as dz
from sklearn.preprocessing import LabelEncoder
import numpy as np

# Load data and drop columns
df = dz.get(data='census_income')
drop_cols = ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week', 'race', 'sex']
df.drop(labels=drop_cols, axis=1, inplace=True)

# Binary treatment (Doctorate)
df['education'] = df['education'] == 'Doctorate'

# Encode all variables
le = LabelEncoder()
for col in df.columns:
    df[col] = le.fit_transform(df[col])

# Define causal model
model = CausalModel(
    data=df,
    treatment='education',
    outcome='salary',
    common_causes=list(df.columns[~np.isin(df.columns, ['education', 'salary'])]),
    graph_builder='ges',
    alpha=0.05
)
model.view_model()

# Identify effect
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)

# Estimate effect (backdoor with propensity score stratification)
estimate = model.estimate_effect(identified_estimand, method_name="backdoor.propensity_score_stratification")
print(estimate)

# Refute estimate
refute_results = model.refute_estimate(identified_estimand, estimate, method_name="random_common_cause")
print(refute_results)

The estimated average treatment effect (ATE) is about 0.47, indicating a strong positive causal impact of a doctorate on high income. DoWhy excels in transparent hypothesis testing but requires binary treatment and extensive preprocessing.

5. PyAgrum

PyAgrum provides a complete probabilistic graphical model suite (Bayesian networks, Markov networks, decision graphs). It supports structure learning, parameter learning, inference, and visualization, but all variables must be discrete.

# Install
pip install pyagrum
pip install setgraphviz  # optional for visualization

import datazets as dz
import pyagrum as gum
from setgraphviz import setgraphviz
setgraphviz()

# Load and clean data
df = dz.get(data='census_income')
drop_cols = ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week', 'race', 'sex']
df.drop(labels=drop_cols, axis=1, inplace=True)

df2 = df.dropna().copy()
df2 = df2.fillna('missing').replace('?', 'missing')
for col in df2.columns:
    df2[col] = df2[col].astype('category')

# Learn structure with BIC and hill climbing
learner = gum.BNLearner(df2)
learner.useScoreBIC()
learner.useGreedyHillClimbing()
bn = learner.learnBN()

# Learn parameters
bn2 = learner.learnParameters(bn.dag())

gum.showBN(bn2)

The learned network (arrows indicate potential causal direction) is displayed below:

PyAgrum offers transparent learning and supports constraint learning, but data preparation is heavy and it lacks automatic handling of missing values.

6. CausalImpact

CausalImpact, originally from Google, uses Bayesian structural time‑series models to evaluate the effect of an intervention on a time‑series outcome.

# Install
pip install causalimpact

import numpy as np
import pandas as pd
from statsmodels.tsa.arima_process import arma_generate_sample
from causalimpact import CausalImpact

# Simulate data
x1 = arma_generate_sample(ar=[0.999], ma=[0.9], nsample=100) + 100
y = 1.2 * x1 + np.random.randn(100)
y[71:] += 10  # introduce intervention
data = pd.DataFrame(np.column_stack([y, x1]), columns=["y", "x1"])

# Define pre‑ and post‑intervention periods
impact = CausalImpact(data, pre_period=[0,69], post_period=[71,99])
impact.run()
impact.plot()
print(impact.summary())

The plot shows the observed series, the model’s counterfactual prediction, the pointwise causal effect, and the cumulative effect. The summary reports a statistically significant uplift with a p‑value of 0, confirming the intervention’s impact.

Overall Comparison

All six libraries together cover the full causal analysis pipeline—from discovering causal structure to estimating causal effects. They can be grouped into two categories:

Structure‑learning libraries : Bnlearn, Pgmpy, CausalNex, PyAgrum. These focus on building causal graphs from data and are ideal for exploratory analysis.

Causal‑effect libraries : DoWhy, CausalImpact. These assume a known structure or time‑series and quantify the impact of a treatment or intervention.

From an engineering perspective: Bnlearn offers the highest usability with a friendly API. Pgmpy provides the greatest flexibility for custom algorithms. CausalNex adds a visual interface but is limited to specific Python versions. DoWhy is widely used in academia for rigorous causal validation. PyAgrum is stable and transparent but requires extensive data preparation. CausalImpact specializes in time‑series causal evaluation.

Choosing the right tool depends on the project’s stage: quick prototyping and validation favor Bnlearn; deep custom research may benefit from Pgmpy; production‑grade causal effect estimation in time‑series contexts is best served by CausalImpact.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python causal inference CausalImpact DoWhy Bayesian networks library comparison

Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.