Artificial Intelligence 11 min read

Building a Decision Tree Model in Python Using Entropy and Gini Impurity

This tutorial walks through creating, visualizing, and exporting two Python decision‑tree classifiers—one using entropy and the other using Gini impurity—by installing required packages, preparing a simple dataset, training the models with scikit‑learn, and rendering the trees with Graphviz.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Building a Decision Tree Model in Python Using Entropy and Gini Impurity

Decision trees remain a timeless topic, and this article demonstrates how to combine multiple trees into a single predictive model by using ensemble methods.

Ensemble methods can achieve higher accuracy than many deep‑learning models while keeping complexity low, and they are widely used in Kaggle competitions for variable selection and prediction.

1. Jupyter Notebook The tutorial assumes you will follow a Jupyter notebook to explore the concepts and references additional data‑science articles.

2. Install Packages The required packages are pydot and graphviz . Installation commands are executed directly in the notebook:

<code>!pip install --upgrade pydot
!pip install --upgrade graphviz</code>

On Windows, you can also use !conda install python-Graphviz if needed.

3. Import Packages The script imports pandas, scikit‑learn’s DecisionTreeClassifier and export_graphviz , as well as pydot and graphviz for visualization:

<code>import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
import pydot
import graphviz</code>

4. Create Dataset A list of dictionaries representing whether an animal is a “Best Friend” and its species is defined, then converted into a pandas DataFrame.

<code>instances = [
    {'Best Friend': False, 'Species': 'Dog'},
    {'Best Friend': True, 'Species': 'Dog'},
    {'Best Friend': True, 'Species': 'Cat'},
    ...
]

df = pd.DataFrame(instances)</code>

5. Split Data Using list comprehensions, the feature column Best Friend is turned into binary input X_train , and the target Species is encoded as 1 for Dog and 0 for Cat in y_train :

<code>X_train = [[1] if a else [0] for a in df['Best Friend']]
y_train = [1 if d == 'Dog' else 0 for d in df['Species']]
labels = ['Best Friend']</code>

6. Build Entropy‑Based Tree A DecisionTreeClassifier is instantiated with criterion='entropy' and fitted on the training data:

<code>model_v1 = DecisionTreeClassifier(
    max_depth=None,
    max_features=None,
    criterion='entropy',
    min_samples_leaf=1,
    min_samples_split=2)
model_v1.fit(X_train, y_train)</code>

The tree is exported to a Graphviz .dot file and rendered:

<code>file = '/Doc/MachineLearning/Python/DecisionTree/tree_model_v1.dot'
export_graphviz(model_v1, out_file=file, feature_names=labels)
with open(file) as f:
    dot_graph = f.read()
graphviz.Source(dot_graph)</code>

Optionally, the .dot file can be converted to PNG with:

<code>!dot -Tpng tree_model_v1.dot -o tree_model_v1.png</code>

7. Build Gini‑Based Tree A second classifier is created without specifying the criterion, causing scikit‑learn to default to Gini impurity. The same training data are used:

<code>model_v2 = DecisionTreeClassifier(
    max_depth=None,
    max_features=None,
    min_samples_leaf=1,
    min_samples_split=2)
model_v2.fit(X_train, y_train)</code>

The Gini tree is exported and visualized in the same way, and its PNG can be generated with:

<code>!dot -Tpng tree_model_v2.dot -o tree_model_v2.png</code>

Both trees differ only in the splitting criterion (entropy vs. Gini); neither is inherently superior, and the choice depends on the data and business context. Creating multiple versions and evaluating their performance is recommended.

Machine LearningPythondecision treeentropyGraphvizGini Impurityscikit-learn
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.