Boost Your Models with LightGBM: Fast, Accurate Gradient Boosting in Python
This article introduces LightGBM, a high‑performance gradient boosting framework, explains its advantages over XGBoost, and provides step‑by‑step Python code for building classification and regression models on the Iris dataset, including model training, evaluation, and visualizing feature importance and tree structures.
Introduction
At the end of 2016, Microsoft’s DMTK team open‑sourced LightGBM on GitHub, quickly gaining over 1,000 stars and 200 forks, demonstrating its popularity.
Gradient Boosting Decision Tree (GBDT) remains a long‑standing model in machine learning, using weak learners (decision trees) iteratively to produce a strong model with good training performance and resistance to over‑fitting. GBDT is widely used in industry for click‑through‑rate prediction, search ranking, and dominates many Kaggle competition solutions.
LightGBM (Light Gradient Boosting Machine) is a lightweight framework implementing the GBDT algorithm, offering high‑efficiency parallel training and several advantages:
Faster training speed
Lower memory consumption
Higher accuracy
Support for parallel learning
Ability to handle large‑scale data
Python Implementation
Classification Model
Training the model
<code>import lightgbm as gbm
from sklearn.datasets import load_iris
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
# read in the iris data
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# train the model
model = gbm.LGBMClassifier(max_depth=5, learning_rate=0.1, n_estimators=160, silent=True, objective='multi:softmax')
model.fit(X_train, y_train)
# predict on the test set
ans = model.predict(X_test)
model.score(X_test, y_test)</code>Plotting feature importance
<code>plt.figure()
gbm.plot_importance(model)
plt.show()</code>Plotting the tree
<code>plt.figure()
gbm.plot_tree(model)
plt.show()</code>Regression Model
Training the model
<code>import lightgbm as gbm
from sklearn.datasets import load_iris
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
# read in the iris data
iris = load_iris()
X = iris.data[:, :3]
y = iris.data[:, 3]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# train the model
model = gbm.LGBMRegressor(max_depth=5, learning_rate=0.1, n_estimators=160, silent=True)
model.fit(X_train, y_train)
# predict on the test set
ans = model.predict(X_test)
model.score(X_test, y_test)</code>Plotting feature importance
<code>plt.figure()
gbm.plot_importance(model)
plt.show()</code>Plotting the tree
<code>plt.figure()
gbm.plot_tree(model)
plt.show()</code>Reference
https://blog.csdn.net/anshuai_aw1/article/details/83659932
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.