Artificial Intelligence 18 min read

Understanding Machine Learning vs Deep Learning and a Practical sklearn Regression Tutorial

This article explains the difference between machine learning and deep learning, compares ML algorithms with traditional logic code, introduces the scikit‑learn library, demonstrates data preprocessing, model training with RandomForestRegressor, and shows how to build a voting regressor for disease progression prediction using Python.

Rare Earth Juejin Tech Community

Sep 15, 2023

Understanding Machine Learning vs Deep Learning and a Practical sklearn Regression Tutorial

Distinguishing Machine Learning and Deep Learning

Machine learning is a branch of artificial intelligence that requires manual feature extraction, similar to an expert examining a vase, whereas deep learning uses instruments to automatically discover useful representations from raw data, like a spectrometer measuring carbon isotopes.

For example, predicting a diabetic patient's blood sugar level starts with expert‑chosen features such as height and weight; deep learning would ingest raw video recordings and learn the relevant patterns itself.

Machine Learning Algorithms vs Traditional Logic Code

Traditional if/else logic is static and must be rewritten when data patterns change, while machine‑learning models adapt their parameters automatically to new data without code modifications.

Old ML algorithms need hand‑crafted features, are lightweight, and work well on small datasets (e.g., license‑plate recognition). Modern deep‑learning models learn features autonomously, handle large‑scale tasks, and require more compute.

Introduction to sklearn

The sklearn (scikit‑learn) library provides utilities for data preprocessing, feature engineering, model selection, and evaluation, including many classic algorithms such as clustering, regression, and ensemble methods.

Clustering algorithms automatically group data without supervision; examples include spectral clustering, density‑based clustering, K‑means, and hierarchical clustering.

Practical: Voting Regressor for Disease Progress Prediction

Data Collection and Description

A public diabetes dataset contains 442 samples with 10 input features (AGE, SEX, BMI, BP, S1‑S6) and one target variable (1‑year blood‑glucose level).

Index

Name

Example

AGE

SEX

BMI

32.1

101

157

93.2

4.8598

Output

151

Data Preprocessing

Features are mean‑centered and scaled by standard deviation to make them comparable across different units.

import numpy as np
datasets = np.loadtxt("diabetes.txt")
X = datasets[:, :10]  # inputs
y = datasets[:, -1]   # output
X_mean = np.mean(X, axis=0)
X_centered = X - X_mean
std = np.std(X_centered, axis=0)
scaled_X = X_centered * std
scale = np.sqrt(np.sum(scaled_X**2, axis=0))
scaled_X = scaled_X / scale

Model Training

A RandomForestRegressor from sklearn is trained on the scaled data.

from sklearn.ensemble import RandomForestRegressor
reg_rf = RandomForestRegressor(random_state=1)
reg_rf.fit(scaled_X, y)

Prediction

For a new patient, the same preprocessing is applied before calling predict:

x_new = [[49, 1, 31.1, 110, 154, 95.2, 33, 4, 4.6692, 97]]
scaled_x_new = (x_new - X_mean) * std / scale
pred_rf = reg_rf.predict(scaled_x_new)
# pred_rf => array([213.26])

Voting Regressor

Multiple regressors (GradientBoostingRegressor, RandomForestRegressor, LinearRegression) are combined using VotingRegressor to improve robustness.

from sklearn.ensemble import GradientBoostingRegressor, VotingRegressor
from sklearn.linear_model import LinearRegression
reg1 = GradientBoostingRegressor(random_state=1)
reg1.fit(scaled_X, y)
reg2 = RandomForestRegressor(random_state=1)
reg2.fit(scaled_X, y)
reg3 = LinearRegression()
reg3.fit(scaled_X, y)
reg = VotingRegressor([('gb', reg1), ('rf', reg2), ('lr', reg3)])
reg.fit(scaled_X, y)
pred = reg.predict(scaled_x_new)

Conclusion

Traditional machine‑learning algorithms are lightweight AI solutions suitable for tasks with relatively stable features; they require domain expertise for feature engineering. Deep learning excels when raw data is abundant and feature extraction is complex, but both approaches have their place.

Predicting lottery numbers with machine learning is infeasible because the draws are purely random and contain no deterministic pattern.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning Python Regression sklearn

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.