Artificial Intelligence 33 min read

Comprehensive Overview of Ten Regression Algorithms with Core Concepts and Code Examples

This article provides a comprehensive summary of ten regression algorithms—including linear, ridge, Lasso, decision tree, random forest, gradient boosting, SVR, XGBoost, LightGBM, and neural network regression—detailing their principles, advantages, disadvantages, suitable scenarios, and offering core Python code examples for each.

IT Services Circle

Mar 6, 2024

Comprehensive Overview of Ten Regression Algorithms with Core Concepts and Code Examples

Regression algorithms establish relationships between features and target variables, enabling prediction, trend analysis, and feature importance assessment across many data‑science tasks.

Linear Regression

Linear regression models the linear relationship between a dependent variable and one or more independent variables.

Core Principle

Simple linear regression fits a line y = β₀ + β₁x by minimizing the residual sum of squares (RSS); multiple linear regression extends this to multiple predictors.

Advantages & Disadvantages

Advantages: easy to understand and implement; performs well when the relationship is strongly linear. Disadvantages: sensitive to outliers and noise; cannot capture non‑linear patterns.

Core Code Example

import numpy as<br/>import matplotlib.pyplot as<br/>from sklearn.linear_model import LinearRegression<br/><br/># Generate example data<br/>np.random.seed(0)<br/>X = 2 * np.random.rand(100, 1)<br/>Y = 4 + 3 * X + np.random.randn(100, 1)<br/><br/># Manual least‑squares calculation<br/>X_mean = np.mean(X)<br/>Y_mean = np.mean(Y)<br/>numerator = np.sum((X - X_mean) * (Y - Y_mean))<br/>denominator = np.sum((X - X_mean)**2)<br/>beta_1 = numerator / denominator<br/>beta_0 = Y_mean - beta_1 * X_mean<br/><br/># sklearn implementation<br/>model = LinearRegression()<br/>model.fit(X, Y)<br/><br/># Plot data and regression lines<br/>plt.scatter(X, Y, label='Data Points')<br/>plt.plot(X, beta_0 + beta_1 * X, color='red', label='Regression Line (Manual)')<br/>plt.plot(X, model.predict(X), color='green', linestyle='dashed', label='Regression Line (Sklearn)')<br/>plt.xlabel('X')<br/>plt.ylabel('Y')<br/>plt.legend()<br/>plt.show()

Ridge Regression

Ridge regression adds an L2 regularization term to the ordinary least‑squares loss to mitigate multicollinearity and improve model stability.

Core Principle

The objective becomes minimizing ||y - Xw||² + α||w||², where α controls the strength of regularization.

Advantages & Disadvantages

Advantages: stabilizes coefficient estimates in the presence of highly correlated features; works well with high‑dimensional data. Disadvantages: requires tuning the regularization parameter; not suitable when the number of features exceeds the number of samples dramatically.

Core Code Example

import numpy as<br/>import matplotlib.pyplot as<br/>from sklearn.linear_model import Ridge<br/>from sklearn.preprocessing import StandardScaler<br/><br/># Generate example data<br/>np.random.seed(0)<br/>X = 2 * np.random.rand(100, 1)<br/>Y = 4 + 3 * X + np.random.randn(100, 1)<br/><br/># Ridge regression<br/>alpha = 1.0<br/>ridge_model = Ridge(alpha=alpha)<br/>ridge_model.fit(X, Y)<br/><br/># Plot data and ridge regression line<br/>plt.scatter(X, Y, label='Data Points')<br/>plt.plot(X, ridge_model.predict(X), color='red', label=f'Ridge Regression (alpha={alpha})')<br/>plt.xlabel('X')<br/>plt.ylabel('Y')<br/>plt.legend()<br/>plt.show()

Lasso Regression

Lasso regression introduces an L1 regularization term, which can shrink some coefficients to zero, thereby performing feature selection.

Core Principle

The objective is minimizing ||y - Xw||² + α||w||₁, where the L1 norm encourages sparsity.

Advantages & Disadvantages

Advantages: performs automatic feature selection; works well on high‑dimensional data. Disadvantages: not suitable when the number of features greatly exceeds the number of samples; may arbitrarily select one among highly correlated variables.

Core Code Example

import numpy as<br/>import matplotlib.pyplot as<br/>from sklearn.linear_model import Lasso<br/>from sklearn.preprocessing import StandardScaler<br/><br/># Generate example data<br/>np.random.seed(0)<br/>X = 2 * np.random.rand(100, 1)<br/>Y = 4 + 3 * X + np.random.randn(100, 1)<br/><br/># Lasso regression<br/>alpha = 0.1<br/>lasso_model = Lasso(alpha=alpha)<br/>lasso_model.fit(X, Y)<br/><br/># Plot data and Lasso regression line<br/>plt.scatter(X, Y, label='Data Points')<br/>plt.plot(X, lasso_model.predict(X), color='red', label=f'Lasso Regression (alpha={alpha})')<br/>plt.xlabel('X')<br/>plt.ylabel('Y')<br/>plt.legend()<br/>plt.show()

Decision Tree Regression

Decision‑tree regression partitions the input space recursively and predicts the average target value of the samples in each leaf.

Core Principle

The tree is built by selecting splits that minimize the mean‑squared error within child nodes, continuing recursively until stopping criteria are met.

Advantages & Disadvantages

Advantages: easy to interpret and visualize; handles non‑linear relationships and is robust to outliers. Disadvantages: prone to over‑fitting; predictions can be unstable with small data changes.

Core Code Example

import numpy as<br/>import matplotlib.pyplot as<br/>from sklearn.tree import DecisionTreeRegressor<br/><br/># Generate example data<br/>np.random.seed(0)<br/>X = np.sort(5 * np.random.rand(80, 1), axis=0)<br/>y = np.sin(X).ravel() + np.random.randn(80) * 0.1<br/><br/># Decision‑tree regression<br/>tree_model = DecisionTreeRegressor(max_depth=4)<br/>tree_model.fit(X, y)<br/><br/># Predict new points<br/>X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]<br/>y_pred = tree_model.predict(X_test)<br/><br/># Plot
plt.scatter(X, y, s=20, edgecolor='black', c='darkorange', label='data')<br/>plt.plot(X_test, y_pred, color='cornflowerblue', label='prediction')<br/>plt.xlabel('data')<br/>plt.ylabel('target')<br/>plt.title('Decision Tree Regression')<br/>plt.legend()<br/>plt.show()

Random Forest Regression

Random forest regression builds an ensemble of decision trees on bootstrapped samples and averages their predictions to improve generalization.

Core Principle

Each tree is trained on a random subset of data and a random subset of features; the final prediction is the mean of all tree outputs.

Advantages & Disadvantages

Advantages: reduces over‑fitting, handles high‑dimensional data, captures complex feature interactions. Disadvantages: lower interpretability, longer training time, can still over‑fit on noisy data.

Core Code Example

import numpy as<br/>import matplotlib.pyplot as<br/>from sklearn.ensemble import RandomForestRegressor<br/><br/># Generate example data<br/>np.random.seed(0)<br/>X = np.sort(5 * np.random.rand(80, 1), axis=0)<br/>y = np.sin(X).ravel() + np.random.randn(80) * 0.1<br/><br/># Random forest regression<br/>rf_model = RandomForestRegressor(n_estimators=100, random_state=42)<br/>rf_model.fit(X, y)<br/><br/># Predict new points<br/>X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]<br/>y_pred = rf_model.predict(X_test)<br/><br/># Plot
plt.scatter(X, y, s=20, edgecolor='black', c='darkorange', label='data')<br/>plt.plot(X_test, y_pred, color='cornflowerblue', label='prediction')<br/>plt.xlabel('data')<br/>plt.ylabel('target')<br/>plt.title('Random Forest Regression')<br/>plt.legend()<br/>plt.show()

Gradient Boosting Regression

Gradient boosting builds trees sequentially, each new tree correcting the residual errors of the combined previous trees.

Core Principle

At each iteration, the negative gradient of the loss (usually MSE) is computed, a new weak learner is fitted to this gradient, and the model is updated with a learning‑rate weighted addition of the new learner.

Advantages & Disadvantages

Advantages: captures complex non‑linear relationships; improves performance iteratively. Disadvantages: longer training time; sensitive to outliers.

Core Code Example

import numpy as<br/>import matplotlib.pyplot as<br/>from sklearn.ensemble import GradientBoostingRegressor<br/><br/># Generate example data<br/>np.random.seed(0)<br/>X = np.sort(5 * np.random.rand(80, 1), axis=0)<br/>y = np.sin(X).ravel() + np.random.randn(80) * 0.1<br/><br/># Gradient boosting regression<br/>gb_model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, random_state=42)<br/>gb_model.fit(X, y)<br/><br/># Predict new points<br/>X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]<br/>y_pred = gb_model.predict(X_test)<br/><br/># Plot
plt.scatter(X, y, s=20, edgecolor='black', c='darkorange', label='data')<br/>plt.plot(X_test, y_pred, color='cornflowerblue', label='prediction')<br/>plt.xlabel('data')<br/>plt.ylabel('target')<br/>plt.title('Gradient Boosting Regression')<br/>plt.legend()<br/>plt.show()

Support Vector Regression (SVR)

SVR applies the support‑vector‑machine principle to regression, using kernel functions to map inputs into high‑dimensional feature spaces.

Core Principle

The algorithm finds a function that deviates from the actual targets by at most ε and is as flat as possible, solving a convex optimization problem with a regularization parameter C.

Advantages & Disadvantages

Advantages: effective in high‑dimensional spaces; flexible via kernel choice. Disadvantages: training time grows quickly with dataset size; sensitive to feature scaling and parameter selection.

Core Code Example

import numpy as<br/>import matplotlib.pyplot as<br/>from sklearn.svm import SVR<br/><br/># Generate example data<br/>np.random.seed(0)<br/>X = 5 * np.random.rand(100, 1)<br/>y = np.sin(X).ravel() + np.random.randn(100) * 0.1<br/><br/># SVR with RBF kernel<br/>svr_model = SVR(kernel='rbf', C=100, epsilon=0.1, gamma='auto')<br/>svr_model.fit(X, y)<br/><br/># Predict new points<br/>X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]<br/>y_pred = svr_model.predict(X_test)<br/><br/># Plot
plt.scatter(X, y, s=20, edgecolor='black', c='darkorange', label='data')<br/>plt.plot(X_test, y_pred, color='cornflowerblue', label='prediction')<br/>plt.xlabel('data')<br/>plt.ylabel('target')<br/>plt.title('Support Vector Regression (RBF Kernel)')<br/>plt.legend()<br/>plt.show()

XGBoost Regression

XGBoost is an optimized gradient‑boosting framework that builds additive tree models using second‑order Taylor approximation of the loss.

Core Principle

At each iteration, the algorithm fits a tree to the negative gradient (first derivative) and uses the second derivative to compute optimal leaf weights, with regularization to control model complexity.

Advantages & Disadvantages

Advantages: high efficiency, built‑in regularization, robust handling of missing values, and flexibility with custom loss functions. Disadvantages: can be memory‑intensive for extremely large datasets.

Core Code Example

import xgboost as xgb<br/>from sklearn.datasets import load_diabetes<br/>from sklearn.model_selection import train_test_split<br/>from sklearn.metrics import mean_squared_error<br/>import matplotlib.pyplot as plt<br/><br/># Load data<br/>diabetes = load_diabetes()<br/>X, y = diabetes.data, diabetes.target<br/>X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)<br/><br/># XGBoost regressor
params = {'objective': 'reg:squarederror', 'max_depth': 3, 'learning_rate': 0.1, 'n_estimators': 100}
model = xgb.XGBRegressor(**params)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print('Mean Squared Error:', mse)

# Feature importance plot
xgb.plot_importance(model)
plt.show()

LightGBM Regression

LightGBM is a gradient‑boosting framework that uses histogram‑based algorithms and gradient‑based one‑side sampling to accelerate training.

Core Principle

Features are bucketed into discrete bins, histograms are built for each bin, and the best split is found efficiently; GOSS and EFB further speed up training on large data.

Advantages & Disadvantages

Advantages: fast training, low memory usage, high accuracy, and good scalability. Disadvantages: may require careful tuning of leaf number and learning rate.

Core Code Example

import lightgbm as lgb<br/>from sklearn.datasets import load_diabetes<br/>from sklearn.model_selection import train_test_split<br/>from sklearn.metrics import mean_squared_error<br/>import matplotlib.pyplot as plt<br/><br/># Load data<br/>diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create LightGBM datasets
train_data = lgb.Dataset(X_train, label=y_train)
test_data = lgb.Dataset(X_test, label=y_test)

# Parameters
params = {
    'objective': 'regression',
    'metric': 'l2',
    'num_leaves': 31,
    'learning_rate': 0.1,
    'feature_fraction': 0.8,
    'bagging_fraction': 0.8,
    'bagging_freq': 5
}

# Train model
model = lgb.train(params, train_data, valid_sets=[test_data], num_boost_round=100, early_stopping_rounds=10)

# Predict and evaluate
y_pred = model.predict(X_test, num_iteration=model.best_iteration)
mse = mean_squared_error(y_test, y_pred)
print('Mean Squared Error:', mse)

# Feature importance plot
lgb.plot_importance(model)
plt.show()

Neural Network Regression

Neural‑network regression uses multilayer perceptrons to model complex non‑linear relationships between inputs and a continuous target.

Core Principle

Input features are passed through hidden layers with activation functions (e.g., ReLU) to produce a scalar output; the network is trained by minimizing mean‑squared error via back‑propagation.

Advantages & Disadvantages

Advantages: capable of learning highly non‑linear mappings; scales well with large datasets. Disadvantages: longer training time, requires substantial data, and can over‑fit without proper regularization.

Core Code Example

import numpy as<br/>import matplotlib.pyplot as<br/>from sklearn.neural_network import MLPRegressor<br/><br/># Generate example data<br/>np.random.seed(0)
X = 5 * np.random.rand(100, 1)
y = np.sin(X).ravel() + np.random.randn(100) * 0.1

# Neural network regression
nn_model = MLPRegressor(hidden_layer_sizes=(100,), activation='relu', max_iter=1000, random_state=42)
nn_model.fit(X, y)

# Predict new points
X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]
y_pred = nn_model.predict(X_test)

# Plot
plt.scatter(X, y, s=20, edgecolor='black', c='darkorange', label='data')
plt.plot(X_test, y_pred, color='cornflowerblue', label='prediction')
plt.xlabel('data')
plt.ylabel('target')
plt.title('Neural Network Regression')
plt.legend()
plt.show()

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning Python Regression Scikit-learn gradient boosting

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.