Artificial Intelligence 8 min read

Master Ensemble Learning: Boosting, Bagging, and Real-World Examples

This article introduces ensemble learning as a meta‑algorithm that combines multiple base classifiers, explains the two main strategies—Boosting and Bagging—covers their bias‑variance trade‑offs, outlines essential steps, and provides concrete examples such as AdaBoost, Random Forest, and GBDT applied to user age prediction.

Hulu Beijing

Dec 22, 2017

Master Ensemble Learning: Boosting, Bagging, and Real-World Examples

Ensemble Learning Overview

Ensemble learning (also called meta‑algorithm) combines multiple machine learning models to form a unified decision, aiming to improve overall performance compared to any single base classifier.

Scenario Description

When tackling a machine‑learning problem, one approach is to try many models and fine‑tune the best one (like selecting the strongest athlete). Another, more powerful approach is to aggregate the strengths of several models, similar to a ruler consulting many advisors before making a decision. This aggregation strategy is called ensemble learning, and each individual model is a base classifier .

Two Main Types of Ensemble Learning

Boosting : Base classifiers are trained sequentially. Each subsequent classifier focuses more on the samples mis‑classified by the previous ones, assigning them higher weights. The final prediction is a weighted combination of all classifiers.

Bagging : Base classifiers are trained independently (can be parallel). A classic example is Random Forest, which builds many decision‑tree classifiers on different subsets of the training data.

Bias‑Variance Perspective

Base classifiers are often called weak classifiers because their error rate is higher than that of the ensemble. Their error consists of bias (systematic error due to limited model capacity) and variance (sensitivity to training data fluctuations). Boosting reduces bias by concentrating on previously mis‑classified samples, while Bagging reduces variance by averaging over many independently trained models.

Bagging Example

A simple bagging illustration is shown below:

Key Steps of Ensemble Learning

1. Find base classifiers whose errors are (approximately) independent. 2. Train each base classifier. 3. Merge the results of the base classifiers.

The merging can be done via voting (majority vote) or stacking (using the output of one classifier as input to the next).

AdaBoost Example

Training base classifiers for AdaBoost:

Merging base classifiers (weighted voting):

GBDT Example

Gradient Boosted Decision Tree (GBDT) builds each new tree to predict the residuals of the combined previous trees, gradually correcting errors.

For instance, in a video‑streaming platform, we may predict a user's age based on watch time, time of day, and video genre. If the first tree predicts age 22 for a user whose true age is 25 (residual = 3), the next tree learns to predict this residual, and the sum of predictions converges toward the true age.

Common Base Classifiers

The most frequently used base classifier is the decision tree because it can easily incorporate sample weights during training and its capacity can be controlled by limiting tree depth.

Random Forest, a Bagging method, works best with “unstable” classifiers that are sensitive to data variations. Linear classifiers or k‑Nearest Neighbors are relatively stable (low variance); using them as base classifiers in Bagging would not significantly reduce variance and may even increase bias, making them unsuitable for this purpose.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

GBDT Random Forest ensemble learning AdaBoost Bagging boosting

Written by

Hulu Beijing

Follow Hulu's official WeChat account for the latest company updates and recruitment information.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.