Artificial Intelligence 4 min read

Master XGBoost: Boosting Trees Explained with Python Code

This article explains the core concepts of XGBoost as a boosting tree algorithm, describes how it builds ensembles of decision trees to predict outcomes, and provides complete Python implementations for classification and regression using the Scikit-learn interface, along with visualizations of trees and feature importance.

Model Perspective

Sep 27, 2022

Master XGBoost: Boosting Trees Explained with Python Code

XGBoost is one of the boosting algorithms. Boosting combines many weak classifiers into a strong classifier. As a gradient boosting tree model, XGBoost integrates many tree models to form a powerful classifier.

XGBoost Algorithm Idea

The algorithm continuously adds trees and splits features to grow a tree; each added tree learns a new function to fit the residuals of previous predictions. After training k trees, predicting a sample’s score involves traversing each tree to a leaf node, retrieving its leaf score, and summing all leaf scores.

In the following example, two decision trees are trained; the predicted score for a child is the sum of the scores of the leaf nodes where the child falls in each tree, similarly for a grandparent.

Python Implementation

Classification program based on Scikit-learn interface

from sklearn.datasets import load_iris
import xgboost as xgb
from xgboost import plot_importance
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split

# read in the iris data
iris = load_iris()

X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# 训练模型
model = xgb.XGBClassifier(max_depth=5, learning_rate=0.1, n_estimators=160, silent=True, objective='multi:softmax')
model.fit(X_train, y_train)

# 对测试集进行预测
ans = model.predict(X_test)

# 计算准确率
cnt1 = 0
cnt2 = 0
for i in range(len(y_test)):
    if ans[i] == y_test[i]:
        cnt1 += 1
    else:
        cnt2 += 1

print("Accuracy: %.2f %% " % (100 * cnt1 / (cnt1 + cnt2)))

xgb.plot_tree(model)

# 显示重要特征
plot_importance(model)
plt.show()

Regression program based on Scikit-learn interface

from sklearn.datasets import load_iris
import xgboost as xgb
from xgboost import plot_importance
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split

# read in the iris data
iris = load_iris()

X = iris.data[:, :3]
y = iris.data[:, 3]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# 训练模型
model = xgb.XGBRegressor(max_depth=5, learning_rate=0.1, n_estimators=160, silent=True)
model.fit(X_train, y_train)

# 对测试集进行预测
ans = model.predict(X_test)

xgb.plot_tree(model)
# 显示重要特征
plot_importance(model)
plt.show()

Reference

https://zhuanlan.zhihu.com/p/31182879

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python classification XGBoost boosting

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.