Artificial Intelligence 8 min read

Understanding Decision Trees: From Basic Process to Watermelon Example

This article explains the fundamentals of decision tree learning, describing its recursive construction, the criteria for splitting nodes using information gain based on entropy, and walks through a classic watermelon dataset example to illustrate how attributes are selected and the final tree is built.

Model Perspective

Jun 13, 2022

Understanding Decision Trees: From Basic Process to Watermelon Example

Decision Tree Algorithm

This article cites Professor Zhou Zhihua's "Watermelon Book" (Machine Learning) to introduce the important machine learning algorithm—decision trees.

1.1 Basic Process

A decision tree is a common machine learning method. For a binary classification task, we aim to learn a model from training data that can classify new examples, which can be viewed as a decision process answering "Is the current sample positive?"

As the name suggests, a decision tree makes decisions based on a tree structure, mirroring human decision making. For example, to decide "Is this a good melon?" we may first examine its color, then the shape of its stem, then the sound when tapped, finally reaching a conclusion.

Each decision question corresponds to a test of an attribute (e.g., color, stem). The result of a test either yields a final decision or leads to further questions, constrained by previous decisions. A typical tree consists of a root node, internal nodes, and leaf nodes.

The recursive construction of a decision tree stops in three cases:

(1) All samples at the current node belong to the same class.

(2) The attribute set is empty or all samples have identical values for all attributes.

(3) The sample set at the current node is empty.

In case (2) the node is marked as a leaf and assigned the majority class of its samples; in case (3) it is also marked as a leaf but assigned the majority class of its parent node.

1.2 Splitting Criterion

The key step in decision‑tree learning is selecting the optimal splitting attribute (line 8 of the algorithm). As splitting proceeds, we aim to increase the purity of each node.

1.3 Information Gain

Information entropy is the most common measure of sample set purity.

Assuming a sample set with proportion p of positive examples, its entropy is defined as ... (formula omitted). Smaller entropy indicates higher purity.

For a discrete attribute with k possible values, using it to split the sample set creates k branches. The information gain of the attribute is the reduction in weighted entropy after the split.

Generally, a larger information gain means a greater increase in purity.

Therefore, information gain is used to choose the splitting attribute; the ID3 algorithm uses this criterion.

Consider the classic watermelon dataset containing 17 training examples. The root node contains all samples, with a certain proportion of positive and negative examples, yielding a root entropy of ... (value omitted).

We compute the information gain for each attribute. For the attribute "color", which has three possible values (green, black, white), the dataset is divided into three subsets, and their entropies are calculated. The resulting information gain for "color" is ... (value omitted).

Similarly, we compute gains for other attributes; "texture" has the highest gain and is selected for the first split.

Subsequent splits are performed recursively on each branch. For the branch where "texture = clear", we again compute information gains for the remaining attributes; "stem", "navel", and "touch" obtain the maximum gain and any can be chosen. Repeating this process yields the final decision tree shown below.

References

Zhou Zhihua. Machine Learning. Beijing: Tsinghua University Press, 2016.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning classification Decision Tree entropy Information Gain ID3 algorithm

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.