Artificial Intelligence 10 min read

Master Decision Trees: Concepts, Algorithms, and Python Implementation

This article introduces decision trees in data analysis and machine learning, explains core concepts, compares popular algorithms such as ID3, C4.5, and CART, details their mathematical foundations, and provides a complete Python implementation with code examples for building and visualizing trees.

360 Zhihui Cloud Developer

Oct 31, 2017

Master Decision Trees: Concepts, Algorithms, and Python Implementation

Introduction

Decision trees are a type of machine learning model that classify data by asking a series of feature‑based questions, forming a tree structure where each leaf represents a class.

What is a decision tree?

Simply put, a decision tree uses features to ask questions at each node, splitting the data into subsets until reaching leaf nodes that correspond to class labels.

Common algorithms

Popular decision‑tree algorithms include ID3, C4.5, and CART.

ID3 : selects the feature with the highest information‑gain (entropy reduction) as the node.

C4.5 : an improvement over ID3, offering higher accuracy, handling continuous attributes and missing values.

CART : uses the Gini impurity as split criterion and can handle outliers and missing values.

Mathematical principles

ID3

ID3 uses entropy E(S) to measure disorder; information gain is the reduction in entropy after a split. The goal is to minimize entropy.

Gain quantifies how much entropy is reduced when splitting on a particular feature.

Example: calculating entropy for each feature and selecting the one with the highest gain (e.g., Outlook).

C4.5

C4.5 addresses ID3’s sensitivity to many‑value features by using the gain ratio (information gain divided by split information). It also handles continuous attributes and missing values.

For continuous attributes (e.g., humidity), values are sorted, duplicate values removed, and split points evaluated; the split with the highest gain is chosen.

For missing values, a specific formula adjusts the information gain to account for unknown entries.

Comparison of ID3 and C4.5 shows differences in accuracy and execution time.

Python implementation (C4.5)

Below is a concise Python implementation of the C4.5 algorithm.

def createDataSet():
    dataSet = [[0,0,0,0,'N'],
               [0,0,0,1,'N'],
               [1,0,0,0,'Y'],
               [2,1,0,0,'Y'],
               [2,2,1,0,'Y'],
               [2,2,1,1,'N'],
               [1,2,1,1,'Y']]
    labels = ['outlook','temperature','humidity','windy']
    return dataSet, labels

def calcShannonEnt(dataSet):
    numEntries = len(dataSet)
    labelCounts = {}
    for featVec in dataSet:
        currentLabel = featVec[-1]
        labelCounts[currentLabel] = labelCounts.get(currentLabel, 0) + 1
    shannonEnt = 0.0
    for key in labelCounts:
        prob = float(labelCounts[key]) / numEntries
        shannonEnt -= prob * log(prob, 2)
    return shannonEnt

def chooseBestFeatureToSplit(dataSet):
    numFeatures = len(dataSet[0]) - 1
    baseEntropy = calcShannonEnt(dataSet)
    bestInfoGainRatio = 0.0
    bestFeature = -1
    for i in range(numFeatures):
        featList = [example[i] for example in dataSet]
        uniqueVals = set(featList)
        newEntropy = 0.0
        splitInfo = 0.0
        for value in uniqueVals:
            subDataSet = splitDataSet(dataSet, i, value)
            prob = len(subDataSet) / float(len(dataSet))
            newEntropy += prob * calcShannonEnt(subDataSet)
            splitInfo += -prob * log(prob, 2)
        infoGain = baseEntropy - newEntropy
        if splitInfo == 0:
            continue
        infoGainRatio = infoGain / splitInfo
        if infoGainRatio > bestInfoGainRatio:
            bestInfoGainRatio = infoGainRatio
            bestFeature = i
    return bestFeature

def splitDataSet(dataSet, axis, value):
    retDataSet = []
    for featVec in dataSet:
        if featVec[axis] == value:
            reducedFeatVec = featVec[:axis] + featVec[axis+1:]
            retDataSet.append(reducedFeatVec)
    return retDataSet

def createTree(dataSet, labels):
    classList = [example[-1] for example in dataSet]
    if classList.count(classList[0]) == len(classList):
        return classList[0]
    if len(dataSet[0]) == 1:
        return majorityCnt(classList)
    bestFeat = chooseBestFeatureToSplit(dataSet)
    bestFeatLabel = labels[bestFeat]
    myTree = {bestFeatLabel: {}}
    del(labels[bestFeat])
    featValues = [example[bestFeat] for example in dataSet]
    uniqueVals = set(featValues)
    for value in uniqueVals:
        subLabels = labels[:]
        myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), subLabels)
    return myTree

After constructing the tree, it can be visualized with a plotting utility.

Conclusion

Decision trees are a widely used machine‑learning technique that map object attributes to class labels, providing interpretable models for classification and prediction; they can be extended to handle multiple outputs by building separate trees for each target.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

algorithm machine learning Decision Tree ID3 C4.5

Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.