Artificial Intelligence 13 min read

Mastering Regression: A Comprehensive Guide to Linear and Non‑Linear Models

This article provides an in‑depth overview of regression prediction, covering linear models like OLS, Lasso, Ridge, and Bayesian approaches, as well as non‑linear techniques such as tree ensembles, SVR, KNN, neural networks, and advanced deep learning frameworks for tabular data.

Sohu Tech Products

Mar 6, 2024

Mastering Regression: A Comprehensive Guide to Linear and Non‑Linear Models

Introduction

Regression modeling learns a mapping from input features to continuous output vectors, estimating the conditional expectation of the target variable. The article compiles a comprehensive list of regression models covering both linear and non‑linear approaches.

Linear Models

Linear regression fits a linear combination of features and minimizes the residual sum of squares. Scikit‑learn provides many linear estimators:

Ordinary Least Squares (OLS) regression.

Lasso regression – L1 regularization.

Ridge regression – L2 regularization to handle collinearity.

Stochastic Gradient Descent (SGD) regression – supports L1, L2, or Elastic Net penalties and online feature selection.

ElasticNet – combines L1 and L2 regularization.

Least Angle Regression (LAR) – efficient for high‑dimensional data, provides a full piecewise‑linear solution path.

Orthogonal Matching Pursuit (OMP) – greedy algorithm for sparse linear models.

Bayesian ARD regression – uses an Automatic Relevance Determination prior to infer weight relevance.

Bayesian Ridge regression – Bayesian version of ridge with evidence maximization.

Robust regressors – Huber, Quantile, RANSAC, and Theil‑Sen regressors.

Generalized Linear Models (GLM) – support Poisson, Tweedie, Gamma distributions via link functions.

Non‑Linear Models

Non‑linear regression combines features in a non‑linear fashion. Tree‑based ensemble models are widely used because they adapt to heterogeneous data, are computationally efficient, and generalize well.

Major gradient‑boosting libraries (the “three giants” of data competitions):

XGBoost – https://xgboost.readthedocs.io/en/stable/

LightGBM – https://lightgbm.readthedocs.io/en/stable/

CatBoost – https://catboost.ai/

Other non‑linear function families (polynomial, exponential, logarithmic, S‑shaped, asymptotic) may be chosen based on domain knowledge.

Decision‑Tree Regression (CART) splits samples until leaf nodes contain the average target value.

Support Vector Regression (SVR) uses an ε‑insensitive loss and kernel tricks to handle linear and non‑linear problems while preventing over‑fit.

K‑Nearest Neighbors (KNN) Regression predicts by averaging the targets of the K nearest training points.

Multilayer Perceptron (MLP) Regression maps feature matrices to the target space via forward computation, back‑propagation, and mean‑squared‑error loss.

Random Forest Regression builds many bootstrapped decision trees and averages their predictions to reduce variance.

Deep Forest (gcForest) Regression combines multi‑granularity scanning and cascade forests for representation learning without heavy hyper‑parameter tuning.

Extra Trees Regression creates extremely randomized trees, selecting split thresholds randomly to increase diversity and speed.

AdaBoost Regression reduces bias by iteratively training weak learners, weighting their predictions, and aggregating them.

Histogram‑based Gradient Boosting (HistGradientBoostingRegressor) in scikit‑learn accelerates training on large datasets (n_samples ≥ 10 000) and natively handles missing values.

TabNet is a deep neural network for tabular data that uses sequential attention for instance‑wise feature selection and supports self‑supervised learning.

Interaction Network Contextual Embedding (INCE) applies graph neural networks to embed tabular features before a downstream MLP.

Local Cascade Ensemble (LCE) augments bagging and boosting, is compatible with scikit‑learn pipelines, and improves the performance of Random Forest and XGBoost.

Gated Additive Tree Ensemble (GATE) incorporates a GRU‑like gating mechanism for feature selection within differentiable trees.

Gated Adaptive Network for Deep Automated Learning of Features (GANDALF) simplifies GATE, offering higher efficiency with fewer hyper‑parameters.

A standard framework for tabular deep learning is pytorch_tabular – https://github.com/manujosephv/pytorch_tabular

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning deep learning Regression Scikit-learn gradient boosting linear models non-linear models

Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.