Artificial Intelligence 4 min read

Master XGBoost: Boosting Trees Explained with Python Code

This article explains the core concepts of XGBoost as a boosting tree algorithm, describes how it builds ensembles of decision trees to predict outcomes, and provides complete Python implementations for classification and regression using the Scikit-learn interface, along with visualizations of trees and feature importance.

Model Perspective
Model Perspective
Model Perspective
Master XGBoost: Boosting Trees Explained with Python Code

XGBoost is one of the boosting algorithms. Boosting combines many weak classifiers into a strong classifier. As a gradient boosting tree model, XGBoost integrates many tree models to form a powerful classifier.

XGBoost Algorithm Idea

The algorithm continuously adds trees and splits features to grow a tree; each added tree learns a new function to fit the residuals of previous predictions. After training k trees, predicting a sample’s score involves traversing each tree to a leaf node, retrieving its leaf score, and summing all leaf scores.

In the following example, two decision trees are trained; the predicted score for a child is the sum of the scores of the leaf nodes where the child falls in each tree, similarly for a grandparent.

Python Implementation

Classification program based on Scikit-learn interface

<code>from sklearn.datasets import load_iris
import xgboost as xgb
from xgboost import plot_importance
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split

# read in the iris data
iris = load_iris()

X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# 训练模型
model = xgb.XGBClassifier(max_depth=5, learning_rate=0.1, n_estimators=160, silent=True, objective='multi:softmax')
model.fit(X_train, y_train)

# 对测试集进行预测
ans = model.predict(X_test)

# 计算准确率
cnt1 = 0
cnt2 = 0
for i in range(len(y_test)):
    if ans[i] == y_test[i]:
        cnt1 += 1
    else:
        cnt2 += 1

print("Accuracy: %.2f %% " % (100 * cnt1 / (cnt1 + cnt2)))

xgb.plot_tree(model)

# 显示重要特征
plot_importance(model)
plt.show()
</code>

Regression program based on Scikit-learn interface

<code>from sklearn.datasets import load_iris
import xgboost as xgb
from xgboost import plot_importance
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split

# read in the iris data
iris = load_iris()

X = iris.data[:, :3]
y = iris.data[:, 3]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# 训练模型
model = xgb.XGBRegressor(max_depth=5, learning_rate=0.1, n_estimators=160, silent=True)
model.fit(X_train, y_train)

# 对测试集进行预测
ans = model.predict(X_test)

xgb.plot_tree(model)
# 显示重要特征
plot_importance(model)
plt.show()
</code>

Reference

https://zhuanlan.zhihu.com/p/31182879

Machine LearningPythonregressionclassificationXGBoostBoosting
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.