Master Ensemble Learning: Boosting, Bagging, and Real-World Examples
This article introduces ensemble learning as a meta‑algorithm that combines multiple base classifiers, explains the two main strategies—Boosting and Bagging—covers their bias‑variance trade‑offs, outlines essential steps, and provides concrete examples such as AdaBoost, Random Forest, and GBDT applied to user age prediction.
Ensemble Learning Overview
Ensemble learning (also called meta‑algorithm) combines multiple machine learning models to form a unified decision, aiming to improve overall performance compared to any single base classifier.
Scenario Description
When tackling a machine‑learning problem, one approach is to try many models and fine‑tune the best one (like selecting the strongest athlete). Another, more powerful approach is to aggregate the strengths of several models, similar to a ruler consulting many advisors before making a decision. This aggregation strategy is called ensemble learning, and each individual model is a base classifier .
Two Main Types of Ensemble Learning
Boosting : Base classifiers are trained sequentially. Each subsequent classifier focuses more on the samples mis‑classified by the previous ones, assigning them higher weights. The final prediction is a weighted combination of all classifiers.
Bagging : Base classifiers are trained independently (can be parallel). A classic example is Random Forest, which builds many decision‑tree classifiers on different subsets of the training data.
Bias‑Variance Perspective
Base classifiers are often called weak classifiers because their error rate is higher than that of the ensemble. Their error consists of bias (systematic error due to limited model capacity) and variance (sensitivity to training data fluctuations). Boosting reduces bias by concentrating on previously mis‑classified samples, while Bagging reduces variance by averaging over many independently trained models.
Bagging Example
A simple bagging illustration is shown below:
Key Steps of Ensemble Learning
1. Find base classifiers whose errors are (approximately) independent. 2. Train each base classifier. 3. Merge the results of the base classifiers.
The merging can be done via voting (majority vote) or stacking (using the output of one classifier as input to the next).
AdaBoost Example
Training base classifiers for AdaBoost:
Merging base classifiers (weighted voting):
GBDT Example
Gradient Boosted Decision Tree (GBDT) builds each new tree to predict the residuals of the combined previous trees, gradually correcting errors.
For instance, in a video‑streaming platform, we may predict a user's age based on watch time, time of day, and video genre. If the first tree predicts age 22 for a user whose true age is 25 (residual = 3), the next tree learns to predict this residual, and the sum of predictions converges toward the true age.
Common Base Classifiers
The most frequently used base classifier is the decision tree because it can easily incorporate sample weights during training and its capacity can be controlled by limiting tree depth.
Random Forest, a Bagging method, works best with “unstable” classifiers that are sensitive to data variations. Linear classifiers or k‑Nearest Neighbors are relatively stable (low variance); using them as base classifiers in Bagging would not significantly reduce variance and may even increase bias, making them unsuitable for this purpose.
Hulu Beijing
Follow Hulu's official WeChat account for the latest company updates and recruitment information.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.