Iris Classification with Machine Learning: Data Exploration and Classic Algorithms
This beginner-friendly guide walks through loading the classic Iris dataset, performing exploratory data analysis, and implementing four fundamental classifiers—Decision Tree, Logistic Regression, Support Vector Machine, and K‑Nearest Neighbors—complete with training, visualization, and accuracy evaluation, illustrating a full machine‑learning workflow.
In recent years, artificial intelligence (AI) technologies have surged, with OpenAI releasing products such as ChatGPT and Sora. This article guides beginners through AI developments using the classic Iris classification dataset.
Dataset Introduction
The Iris dataset contains 150 samples of three species—Setosa, Versicolor, and Virginica—each described by four features: sepal length, sepal width, petal length, and petal width.
Data can be loaded with pandas:
import pandas as pd iris = pd.read_csv('./iris.csv', names=['sepal_length','sepal_width','petal_length','petal_width','class']) print(iris.head(10))Exploratory Data Analysis
Descriptive statistics, histograms, KDE plots, and correlation heatmaps are used to understand feature distributions and relationships.
iris.describe() iris.plot(kind='hist', subplots=True, layout=(2,2), figsize=(10,10)) iris.plot(kind='kde') sns.heatmap(iris.iloc[:,:4].corr(), annot=True, cmap='YlGnBu')Classification Algorithms
Four classic classifiers are demonstrated: Decision Tree (CART), Logistic Regression, Support Vector Machine, and K‑Nearest Neighbors.
Decision Tree
Model training:
from sklearn import preprocessing, model_selection, tree label_encoder = preprocessing.LabelEncoder() target = label_encoder.fit_transform(iris['class']) X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.iloc[:,:4].values, target, test_size=0.2, random_state=42) clf = tree.DecisionTreeClassifier(max_depth=4) clf.fit(X_train, y_train) print(clf.feature_importances_)Visualization with Graphviz:
import pydotplus dot_data = tree.export_graphviz(clf, out_file=None, feature_names=['sepal_length','sepal_width','petal_length','petal_width'], class_names=iris['class'].unique(), filled=True, rounded=True) graph = pydotplus.graph_from_dot_data(dot_data) graph.write_png('decision_tree.png')Logistic Regression
Training and evaluation:
from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X_train, y_train) y_pred = model.predict(X_test) accuracy = metrics.accuracy_score(y_test, y_pred)Support Vector Machine
Using an RBF kernel and visualizing decision regions:
from sklearn import svm model = svm.SVC(kernel='rbf', gamma=10, C=10.0, random_state=0) model.fit(X_train_std, y_train) # plot_decision_regions function omitted for brevityK‑Nearest Neighbors
Training and plotting decision boundaries:
from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier(n_neighbors=2, p=2, metric='minkowski') knn.fit(X_train_std, y_train)Evaluation metrics such as accuracy, precision, recall, and F1 score are reported, often achieving 100 % on the test split due to the simplicity of the dataset.
Conclusion
The article demonstrates a complete workflow—from data loading and exploratory analysis to model training and visualization—for the Iris classification problem, providing a practical entry point for beginners in AI and machine learning.
DaTaobao Tech
Official account of DaTaobao Technology
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.