Which Bayesian Causal Inference Library Best Uncovers Hidden Relationships?
This article systematically compares six popular Python causal inference libraries—Bnlearn, Pgmpy, CausalNex, DoWhy, PyAgrum, and CausalImpact—using the U.S. Census income dataset to demonstrate how each tool discovers causal effects of education on salary, highlighting their core features, strengths, weaknesses, and suitable scenarios.
Evaluation dataset
The US Census income dataset is loaded with datazets. Continuous and sensitive columns ( age, fnlwgt, education-num, capital-gain, capital-loss, hours-per-week, race, sex) are dropped, leaving categorical features and the target salary. The causal question is whether holding a graduate degree increases the probability of earning > $50K.
pip install datazets
import datazets as dz
import pandas as pd
df = dz.import_example(data='census_income')
drop_cols = ['age','fnlwgt','education-num','capital-gain','capital-loss','hours-per-week','race','sex']
df.drop(labels=drop_cols, axis=1, inplace=True)
print(df.head())1. Bnlearn
Bnlearn provides a full Bayesian‑network toolbox (structure learning, parameter estimation, inference, synthetic data generation, discretisation, model evaluation, visualisation). Supported search methods include hill‑climb, exhaustive, constraint, Chow‑Liu, Naïve Bayes and TAN; scoring functions include BIC, K2 and BDEU.
import bnlearn as bn
model = bn.structure_learning.fit(df, methodtype='hillclimbsearch', scoretype='bic')
model = bn.independence_test(model, df, test="chi_square", alpha=0.05, prune=True)
model = bn.parameter_learning.fit(model, df)
bn.plot(model, interactive=True)
print(model['model_edges'])Inference queries:
# Doctorate degree
query = bn.inference.fit(model, variables=['salary'], evidence={'education':'Doctorate'})
print(query)
# HS‑grad
query = bn.inference.fit(model, variables=['salary'], evidence={'education':'HS-grad'})
print(query)Results: P(>50K | Doctorate) = 70.9%; P(>50K | HS‑grad) = 16.2%.
Advantages: end‑to‑end pipeline, easy start, good visualisations, handles discrete, continuous or mixed data.
Input data: discrete, continuous or mixed.
2. Pgmpy
Pgmpy offers low‑level building blocks for probabilistic graphical models, requiring explicit model construction, learning and inference.
!pip install pgmpy
from pgmpy.estimators import HillClimbSearch, BicScore
from pgmpy.models import BayesianNetwork
from pgmpy.inference import VariableElimination
est = HillClimbSearch(df)
scoring = BicScore(df)
model = est.estimate(scoring_method=scoring)
print('Discovered edges:', model.edges())
# Parameter learning and inference (example for Doctorate)
from pgmpy.estimators import BayesianEstimator
bayesian_model = BayesianNetwork(model.edges())
bayesian_model.fit(df, estimator=BayesianEstimator, prior_type='bdeu', equivalent_sample_size=1000)
infer = VariableElimination(bayesian_model)
result = infer.query(variables=['salary'], evidence={'education':'Doctorate'})
print(result)Advantages: extremely flexible, suitable for custom algorithms and research.
Disadvantages: steep learning curve; many steps must be coded manually.
Input data: requires discrete data.
3. CausalNex
CausalNex implements the NOTEARS algorithm for structure learning. It works only with numeric (integer‑encoded) discrete data.
!pip install causalnex
from causalnex.structure.notears import from_pandas
from sklearn.preprocessing import LabelEncoder
import networkx as nx
le = LabelEncoder()
df_num = df.apply(le.fit_transform)
sm = from_pandas(df_num)
sm.remove_edges_below_threshold(0.8) # filter weak edges
nx.draw_networkx(sm, with_labels=True)Advantages: integrates state‑of‑the‑art NOTEARS algorithm.
Disadvantages: requires numeric discretisation; compatibility with newer library versions can be problematic.
Input data: numeric discrete.
4. DoWhy
DoWhy focuses on causal effect estimation. The user must specify treatment, outcome and covariates, then provide or learn a causal graph.
!pip install dowhy
from dowhy import CausalModel
from sklearn.preprocessing import LabelEncoder
import numpy as np
le = LabelEncoder()
# Encode binary treatment (Doctorate) and outcome (salary)
df['education'] = (df['education'] == 'Doctorate')
df_num = df.apply(le.fit_transform)
treatment = 'education'
outcome = 'salary'
common_causes = [c for c in df.columns if c not in [treatment, outcome]]
model = CausalModel(data=df_num, treatment=treatment, outcome=outcome,
common_causes=common_causes, graph_builder='ges', alpha=0.05)
model.view_model()
identified_estimand = model.identify_effect()
estimate = model.estimate_effect(identified_estimand, method_name='backdoor.propensity_score_stratification')
print(estimate)
# Robustness check
refute = model.refute_estimate(identified_estimand, estimate, method_name='random_common_cause')
print(refute)The estimated average treatment effect (ATE) is approximately 0.47 (mean increase in probability of > $50K when holding a doctorate). Robustness checks are built‑in.
Advantages: rigorous ATE estimation with multiple robustness checks.
Disadvantages: cannot learn the causal graph; requires statistical expertise to interpret results.
Input data: binary treatment, binary/continuous outcome, plus covariates.
5. PyAgrum
PyAgrum supports Bayesian networks, Markov networks and various learning algorithms. Installation requires setgraphviz for visualisation.
pip install pyagrum setgraphviz
import datazets as dz
import pyagrum as gum
import setgraphviz
df = dz.get(data='census_income')
# Drop the same columns as before
drop_cols = ['age','fnlwgt','education-num','capital-gain','capital-loss','hours-per-week','race','sex']
df.drop(labels=drop_cols, axis=1, inplace=True)
# Ensure categorical type for all columns
for col in df.columns:
df[col] = df[col].astype('category')
learner = gum.BNLearner(df)
learner.useScoreBIC()
learner.useGreedyHillClimbing()
bn = learner.learnBN()
# Parameter learning
bn2 = learner.learnParameters(bn.dag())
# Visualise
setgraphviz()
# (visualisation code omitted for brevity)Advantages: full suite of graph models and learning algorithms.
Disadvantages: strict preprocessing; visualisation depends on Graphviz; smaller community.
Input data: complete discrete dataset.
6. CausalImpact
CausalImpact is specialised for time‑series interventions. It fits a Bayesian structural time‑series model to predict the counterfactual and compares it with observed post‑intervention data.
from causalimpact import CausalImpact
import pandas as pd
# Simulated data: y (traffic), x1 (control), intervention at day 70
data = pd.DataFrame({'y': y_data, 'x1': x1_data})
impact = CausalImpact(data, pre_period=[0, 69], post_period=[70, 99])
impact.run()
impact.plot()
print(impact.summary())Summary output (example):
Actual 130 (cumulative 3773)
Predicted 120 (cumulative 3501)
Absolute Effect 9 (cumulative 272)
Relative Effect 7.8%
P‑value 0.0%
Prob. of Causal Effect 100.0%Advantages: dedicated tool for time‑series causal analysis with intuitive visual output.
Disadvantages: limited to time‑series data; cannot build general causal graphs.
Input data: time‑series with a clearly defined intervention point.
Comparison of libraries
Bnlearn – full‑process Bayesian network; supports discrete/continuous/mixed data; gentle learning curve; best for general causal discovery and inference.
Pgmpy – low‑level probabilistic‑graph building blocks; requires discrete data; steep learning curve; best for custom model research.
CausalNex – NOTEARS‑based structure learning; numeric discrete input; medium learning curve; best when advanced structure learning is needed.
DoWhy – causal effect estimation (ATE) with robustness checks; binary treatment + outcome + covariates; medium learning curve; best for A/B testing or policy evaluation.
PyAgrum – multiple graph models (Bayesian, Markov); discrete input; medium learning curve; best for academic research requiring diverse models.
CausalImpact – Bayesian structural time‑series for interventions; time‑series input; gentle learning curve; best for marketing or product‑change impact assessment.
How to choose
If you need an out‑of‑the‑box solution that automatically discovers causal structure, use Bnlearn.
If you require full control over every modeling step, choose Pgmpy.
When the primary task is estimating the causal effect of a treatment on an outcome, DoWhy provides the most rigorous framework.
For pure time‑series interventions, CausalImpact is the dedicated tool.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data STUDIO
Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
