Implement Random Forest Regression in Python using Scikit-Learn

This article explains the fundamentals of random forest regression, describes why it outperforms single decision trees for nonlinear or noisy data, defines bootstrapping and bagging, and provides a step‑by‑step Python example using NumPy, Pandas, and Scikit‑Learn’s RandomForestRegressor with data loading, preprocessing, model training, prediction, and evaluation via MSE and R².

Code DAO
Code DAO
Code DAO
Implement Random Forest Regression in Python using Scikit-Learn

Random forest regression is an ensemble model composed of many decision trees; predictions are obtained by averaging the outputs of individual trees, making the model much stronger than a single decision tree.

Each tree is trained on a random subset of the data, both rows and columns. At each split, a random subset of features is considered, which introduces diversity among the trees.

Random forest regression is especially suitable when:

There is a nonlinear or complex relationship between features and the target.

A robust model is needed that is less sensitive to noise in the training set, because the ensemble of uncorrelated trees reduces variance.

Linear models tend to overfit; random forests can mitigate over‑fitting by aggregating many trees.

Decision trees can overfit easily when their depth is unrestricted, growing until each data point forms a leaf. Limiting tree depth reduces variance but increases bias; random forests combine many trees with randomness to keep both variance and bias low.

Key concepts

Bootstrapping: sampling with replacement from the original dataset to create multiple training subsets. Because sampling is with replacement, some samples may appear multiple times in a single tree.

Bagging: training each independent decision tree on a different bootstrap sample and averaging their predictions to obtain the final output.

Implementation steps

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_squared_error
import seaborn as sns

Load the dataset (CSV with two columns x and y) and read it into a Pandas DataFrame:

df = pd.read_csv('Random-Forest-Regression-Data.csv')

Extract features and labels, reshape them for scikit‑learn, and split into training and test sets (30% test size, random_state=42 for reproducibility):

x = df.x.values.reshape(-1, 1)
y = df.y.values.reshape(-1, 1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.30, random_state=42)

Train the RandomForestRegressor on the training data:

rf = RandomForestRegressor()
rf.fit(x_train, y_train)

Predict on the test set: y_pred = rf.predict(x_test) Evaluate the model using Mean Squared Error (MSE) and R² score. MSE is computed with NumPy’s square root function to obtain the root‑mean‑square error:

mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

The article includes screenshots of the data loading, model training parameters, prediction results, and evaluation metrics to illustrate each step.

In summary, the guide demonstrates how to preprocess data, split it, train a random forest regression model, make predictions, and assess performance, highlighting why random forests are advantageous for regression tasks with nonlinear relationships or noisy data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

machine learningPythonregressionrandom forestscikit-learnBootstrapping
Code DAO
Written by

Code DAO

We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.