How Decision Trees Work: From Entropy to Gini Index Explained
This article introduces decision tree algorithms, explains their role in supervised learning for classification and regression, details the construction process, compares information gain and Gini index for attribute selection, and reviews popular tree methods such as ID3, C4.5, and CART with illustrative examples.
1 Decision Tree
Decision tree algorithms belong to the supervised learning category and can be used for both regression and classification problems. They represent decisions as a tree where each leaf corresponds to a class label and internal nodes represent attribute tests, allowing any Boolean function on discrete attributes to be expressed.
When using decision trees we typically perform the following steps:
Treat the entire training set as the root.
Prefer categorical feature values; if values are continuous, discretize them before model building.
Record recursive distributions based on attribute values.
Use statistical methods to rank attributes for root or internal nodes.
The main challenge in decision trees is selecting the attribute for each node, known as attribute selection. Two popular methods are Information Gain and Gini Index.
2 Information Gain
When a node splits training instances into smaller subsets, the entropy (a measure of disorder) changes. Information gain quantifies this change in entropy.
Definition: Let S be a set of instances and a an attribute. Let S_v be the subset of S where a = v, and Values(a) be the set of all possible values of a.
2.1 Entropy
Entropy measures the uncertainty of a random variable and reflects the impurity of a sample set; higher entropy indicates more information.
Definition: Let S be a set of instances and a an attribute, S_v be the subset of S where a = v, and Values(a) be the set of all possible values of a.
2.2 Building a Tree with Information Gain
Starting from all training instances at the root, use information gain to choose the attribute that labels each node (ensuring the same discrete attribute does not appear twice along a root‑to‑leaf path), then recursively construct sub‑trees on the subsets.
Now we illustrate building a decision tree using information gain on a small dataset (3 features, 2 classes).
We compute the information gain for each feature.
From the figures we see that feature Y has the highest information gain, so Y is chosen as the first splitting criterion.
The final tree for this dataset looks like:
3 Gini Index
The Gini index measures the frequency of incorrectly classifying a randomly chosen element; thus attributes with lower Gini values are preferred.
Formula for Gini index:
Gini
Consider the following dataset and use the Gini index to build a decision tree:
In this dataset there are five attributes; attribute E is the target feature with two equally sized classes. When using the Gini index we must choose a threshold for each attribute and, similar to information gain, prioritize attributes with lower Gini values for splitting.
4 Decision Tree Algorithms
Well‑known decision tree algorithms include:
ID3: uses information gain to decide which attribute to split on at each node, recursively computing information gain for remaining data.
C4.5: successor of ID3, uses information gain or gain ratio and can handle continuous and missing attribute values.
CART (Classification and Regression Tree): a versatile algorithm that can generate both regression and classification trees based on the dependent variable.
References
https://www.geeksforgeeks.org/decision-tree-introduction-example/?ref=lbp
https://dataaspirant.com/how-decision-tree-algorithm-works/
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.