Artificial Intelligence 10 min read

Master Decision Trees: Concepts, Algorithms, and Python Implementation

This article introduces decision trees in data analysis and machine learning, explains core concepts, compares popular algorithms such as ID3, C4.5, and CART, details their mathematical foundations, and provides a complete Python implementation with code examples for building and visualizing trees.

360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Master Decision Trees: Concepts, Algorithms, and Python Implementation

Introduction

Decision trees are a type of machine learning model that classify data by asking a series of feature‑based questions, forming a tree structure where each leaf represents a class.

What is a decision tree?

Simply put, a decision tree uses features to ask questions at each node, splitting the data into subsets until reaching leaf nodes that correspond to class labels.

Common algorithms

Popular decision‑tree algorithms include ID3, C4.5, and CART.

ID3 : selects the feature with the highest information‑gain (entropy reduction) as the node.

C4.5 : an improvement over ID3, offering higher accuracy, handling continuous attributes and missing values.

CART : uses the Gini impurity as split criterion and can handle outliers and missing values.

Mathematical principles

ID3

ID3 uses entropy E(S) to measure disorder; information gain is the reduction in entropy after a split. The goal is to minimize entropy.

Gain quantifies how much entropy is reduced when splitting on a particular feature.

Example: calculating entropy for each feature and selecting the one with the highest gain (e.g., Outlook).

C4.5

C4.5 addresses ID3’s sensitivity to many‑value features by using the gain ratio (information gain divided by split information). It also handles continuous attributes and missing values.

For continuous attributes (e.g., humidity), values are sorted, duplicate values removed, and split points evaluated; the split with the highest gain is chosen.

For missing values, a specific formula adjusts the information gain to account for unknown entries.

Comparison of ID3 and C4.5 shows differences in accuracy and execution time.

Python implementation (C4.5)

Below is a concise Python implementation of the C4.5 algorithm.

def createDataSet():
    dataSet = [[0,0,0,0,'N'],
               [0,0,0,1,'N'],
               [1,0,0,0,'Y'],
               [2,1,0,0,'Y'],
               [2,2,1,0,'Y'],
               [2,2,1,1,'N'],
               [1,2,1,1,'Y']]
    labels = ['outlook','temperature','humidity','windy']
    return dataSet, labels

def calcShannonEnt(dataSet):
    numEntries = len(dataSet)
    labelCounts = {}
    for featVec in dataSet:
        currentLabel = featVec[-1]
        labelCounts[currentLabel] = labelCounts.get(currentLabel, 0) + 1
    shannonEnt = 0.0
    for key in labelCounts:
        prob = float(labelCounts[key]) / numEntries
        shannonEnt -= prob * log(prob, 2)
    return shannonEnt

def chooseBestFeatureToSplit(dataSet):
    numFeatures = len(dataSet[0]) - 1
    baseEntropy = calcShannonEnt(dataSet)
    bestInfoGainRatio = 0.0
    bestFeature = -1
    for i in range(numFeatures):
        featList = [example[i] for example in dataSet]
        uniqueVals = set(featList)
        newEntropy = 0.0
        splitInfo = 0.0
        for value in uniqueVals:
            subDataSet = splitDataSet(dataSet, i, value)
            prob = len(subDataSet) / float(len(dataSet))
            newEntropy += prob * calcShannonEnt(subDataSet)
            splitInfo += -prob * log(prob, 2)
        infoGain = baseEntropy - newEntropy
        if splitInfo == 0:
            continue
        infoGainRatio = infoGain / splitInfo
        if infoGainRatio > bestInfoGainRatio:
            bestInfoGainRatio = infoGainRatio
            bestFeature = i
    return bestFeature

def splitDataSet(dataSet, axis, value):
    retDataSet = []
    for featVec in dataSet:
        if featVec[axis] == value:
            reducedFeatVec = featVec[:axis] + featVec[axis+1:]
            retDataSet.append(reducedFeatVec)
    return retDataSet

def createTree(dataSet, labels):
    classList = [example[-1] for example in dataSet]
    if classList.count(classList[0]) == len(classList):
        return classList[0]
    if len(dataSet[0]) == 1:
        return majorityCnt(classList)
    bestFeat = chooseBestFeatureToSplit(dataSet)
    bestFeatLabel = labels[bestFeat]
    myTree = {bestFeatLabel: {}}
    del(labels[bestFeat])
    featValues = [example[bestFeat] for example in dataSet]
    uniqueVals = set(featValues)
    for value in uniqueVals:
        subLabels = labels[:]
        myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), subLabels)
    return myTree

After constructing the tree, it can be visualized with a plotting utility.

Conclusion

Decision trees are a widely used machine‑learning technique that map object attributes to class labels, providing interpretable models for classification and prediction; they can be extended to handle multiple outputs by building separate trees for each target.

algorithmmachine learningPythondecision treeID3C4.5
360 Zhihui Cloud Developer
Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.