Visualizing Random Forest Decision Boundaries on the Wine Dataset with dtreeviz
This tutorial demonstrates how to load the wine dataset, train a Random Forest classifier, evaluate its accuracy and confusion matrix, and visualize decision boundaries and misclassifications using scikit‑learn and the dtreeviz library.
Various classification models such as logistic regression, K‑nearest neighbors, decision trees, etc., can predict the class of unknown data based on features. This article uses the wine dataset to demonstrate how to evaluate predictions with accuracy, confusion matrix, and how to visualize decision boundaries and misclassifications using Random Forest and the dtreeviz library.
Data
We load the wine dataset with load_wine() . The dataset contains 13 numeric features for 178 samples. For this example we focus on the flavanoids and proline features, each sample belonging to one of three classes (0, 1, 2).
Model
We train a Random Forest classifier.
<code>import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_wine
from sklearn.metrics import confusion_matrix, accuracy_score
import matplotlib.pyplot as plt
import dtreeviz
from dtreeviz import decision_boundaries
wine = load_wine()
X = wine.data[:, [12, 6]] # proline, flavanoids
y = wine.target
rf = RandomForestClassifier(n_estimators=50, min_samples_leaf=20, n_jobs=-1)
rf.fit(X, y)
</code>Evaluation Results
Accuracy
Using accuracy_score we obtain an accuracy of 0.90.
<code>y_pred = rf.predict(X)
accuracy_score(y, y_pred)
</code>Confusion Matrix
The confusion matrix visualizes correct and incorrect predictions.
<code>import seaborn as sn
sn.heatmap(confusion_matrix(y, y_pred), annot=True)
</code>Original vs Predicted Data
Scatter plots of the two features colored by true class and by predicted class.
<code>fig,axes = plt.subplots(1,2,figsize=(8,3.8),dpi=300)
features = ['proline','flavanoids']
df1 = pd.DataFrame(X, columns=features)
df1['target'] = wine.target
df1['prediction'] = rf.predict(X)
sn.scatterplot(x='proline', y='flavanoids', hue='target', data=df1, ax=axes[0])
sn.scatterplot(x='proline', y='flavanoids', hue='prediction', data=df1, ax=axes[1])
</code>Decision Boundaries
Using dtreeviz.decision_boundaries we plot the classification regions and highlight misclassified points.
<code>fig,axes = plt.subplots(1,2,figsize=(8,3.8),dpi=300)
decision_boundaries(rf, X, y, ax=axes[0], feature_names=['proline','flavanoid'])
decision_boundaries(rf, X, y, ax=axes[1],
show=['instances','boundaries','misclassified'],
feature_names=['proline','flavanoid'])
plt.show()
</code>One‑Dimensional Boundary
We can also visualize the boundary using a single feature ( proline ).
<code>x = df1[['proline']].values
y = df1['target'].astype('int').values
rf = RandomForestClassifier(n_estimators=10, min_samples_leaf=10, n_jobs=-1)
rf.fit(x, y)
decision_boundaries(rf, x, y,
feature_names=['proline'],
target_name='wine_type',
colors={'scatter_marker_alpha': .2},
figsize=(5,1.5))
</code>This demonstrates how to plot classification results, decision boundaries, and misclassifications for a Random Forest model.
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.