Artificial Intelligence 9 min read

Visualizing Random Forest Decision Boundaries on the Wine Dataset with dtreeviz

This tutorial demonstrates how to load the wine dataset, train a Random Forest classifier, evaluate its accuracy and confusion matrix, and visualize decision boundaries and misclassifications using scikit‑learn and the dtreeviz library.

Model Perspective

Jan 20, 2023

Various classification models such as logistic regression, K‑nearest neighbors, decision trees, etc., can predict the class of unknown data based on features. This article uses the wine dataset to demonstrate how to evaluate predictions with accuracy, confusion matrix, and how to visualize decision boundaries and misclassifications using Random Forest and the dtreeviz library.

Data

We load the wine dataset with load_wine(). The dataset contains 13 numeric features for 178 samples. For this example we focus on the flavanoids and proline features, each sample belonging to one of three classes (0, 1, 2).

Model

We train a Random Forest classifier.

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_wine
from sklearn.metrics import confusion_matrix, accuracy_score
import matplotlib.pyplot as plt
import dtreeviz
from dtreeviz import decision_boundaries
wine = load_wine()
X = wine.data[:, [12, 6]]  # proline, flavanoids
y = wine.target
rf = RandomForestClassifier(n_estimators=50, min_samples_leaf=20, n_jobs=-1)
rf.fit(X, y)

Evaluation Results

Accuracy

Using accuracy_score we obtain an accuracy of 0.90.

y_pred = rf.predict(X)
accuracy_score(y, y_pred)

Confusion Matrix

The confusion matrix visualizes correct and incorrect predictions.

import seaborn as sn
sn.heatmap(confusion_matrix(y, y_pred), annot=True)

Original vs Predicted Data

Scatter plots of the two features colored by true class and by predicted class.

fig,axes = plt.subplots(1,2,figsize=(8,3.8),dpi=300)
features = ['proline','flavanoids']
df1 = pd.DataFrame(X, columns=features)
df1['target'] = wine.target
df1['prediction'] = rf.predict(X)
sn.scatterplot(x='proline', y='flavanoids', hue='target', data=df1, ax=axes[0])
sn.scatterplot(x='proline', y='flavanoids', hue='prediction', data=df1, ax=axes[1])

Decision Boundaries

Using dtreeviz.decision_boundaries we plot the classification regions and highlight misclassified points.

fig,axes = plt.subplots(1,2,figsize=(8,3.8),dpi=300)
decision_boundaries(rf, X, y, ax=axes[0], feature_names=['proline','flavanoid'])
decision_boundaries(rf, X, y, ax=axes[1],
    show=['instances','boundaries','misclassified'],
    feature_names=['proline','flavanoid'])
plt.show()

One‑Dimensional Boundary

We can also visualize the boundary using a single feature ( proline).

x = df1[['proline']].values
y = df1['target'].astype('int').values
rf = RandomForestClassifier(n_estimators=10, min_samples_leaf=10, n_jobs=-1)
rf.fit(x, y)
decision_boundaries(rf, x, y,
    feature_names=['proline'],
    target_name='wine_type',
    colors={'scatter_marker_alpha': .2},
    figsize=(5,1.5))

This demonstrates how to plot classification results, decision boundaries, and misclassifications for a Random Forest model.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

classification Random Forest Scikit-learn wine dataset dtreeviz decision boundary

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.