Artificial Intelligence 7 min read

How Decision Trees Work: From Entropy to Gini Index Explained

This article introduces decision tree algorithms, explains their role in supervised learning for classification and regression, details the construction process, compares information gain and Gini index for attribute selection, and reviews popular tree methods such as ID3, C4.5, and CART with illustrative examples.

Model Perspective
Model Perspective
Model Perspective
How Decision Trees Work: From Entropy to Gini Index Explained

1 Decision Tree

Decision tree algorithms belong to the supervised learning category and can be used for both regression and classification problems. They represent decisions as a tree where each leaf corresponds to a class label and internal nodes represent attribute tests, allowing any Boolean function on discrete attributes to be expressed.

When using decision trees we typically perform the following steps:

Treat the entire training set as the root.

Prefer categorical feature values; if values are continuous, discretize them before model building.

Record recursive distributions based on attribute values.

Use statistical methods to rank attributes for root or internal nodes.

The main challenge in decision trees is selecting the attribute for each node, known as attribute selection. Two popular methods are Information Gain and Gini Index.

2 Information Gain

When a node splits training instances into smaller subsets, the entropy (a measure of disorder) changes. Information gain quantifies this change in entropy.

Definition: Let S be a set of instances and a an attribute. Let S_v be the subset of S where a = v, and Values(a) be the set of all possible values of a.

2.1 Entropy

Entropy measures the uncertainty of a random variable and reflects the impurity of a sample set; higher entropy indicates more information.

Definition: Let S be a set of instances and a an attribute, S_v be the subset of S where a = v, and Values(a) be the set of all possible values of a.

2.2 Building a Tree with Information Gain

Starting from all training instances at the root, use information gain to choose the attribute that labels each node (ensuring the same discrete attribute does not appear twice along a root‑to‑leaf path), then recursively construct sub‑trees on the subsets.

Now we illustrate building a decision tree using information gain on a small dataset (3 features, 2 classes).

We compute the information gain for each feature.

From the figures we see that feature Y has the highest information gain, so Y is chosen as the first splitting criterion.

The final tree for this dataset looks like:

3 Gini Index

The Gini index measures the frequency of incorrectly classifying a randomly chosen element; thus attributes with lower Gini values are preferred.

Formula for Gini index:

Gini

Consider the following dataset and use the Gini index to build a decision tree:

In this dataset there are five attributes; attribute E is the target feature with two equally sized classes. When using the Gini index we must choose a threshold for each attribute and, similar to information gain, prioritize attributes with lower Gini values for splitting.

4 Decision Tree Algorithms

Well‑known decision tree algorithms include:

ID3: uses information gain to decide which attribute to split on at each node, recursively computing information gain for remaining data.

C4.5: successor of ID3, uses information gain or gain ratio and can handle continuous and missing attribute values.

CART (Classification and Regression Tree): a versatile algorithm that can generate both regression and classification trees based on the dependent variable.

References

https://www.geeksforgeeks.org/decision-tree-introduction-example/?ref=lbp

https://dataaspirant.com/how-decision-tree-algorithm-works/

CARTmachine learningdecision treeentropyID3information gainC4.5gini index
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.