Artificial Intelligence 18 min read

Understanding Machine Learning vs Deep Learning and a Practical sklearn Regression Tutorial

This article explains the difference between machine learning and deep learning, compares ML algorithms with traditional logic code, introduces the scikit‑learn library, demonstrates data preprocessing, model training with RandomForestRegressor, and shows how to build a voting regressor for disease progression prediction using Python.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Understanding Machine Learning vs Deep Learning and a Practical sklearn Regression Tutorial

Distinguishing Machine Learning and Deep Learning

Machine learning is a branch of artificial intelligence that requires manual feature extraction, similar to an expert examining a vase, whereas deep learning uses instruments to automatically discover useful representations from raw data, like a spectrometer measuring carbon isotopes.

For example, predicting a diabetic patient's blood sugar level starts with expert‑chosen features such as height and weight; deep learning would ingest raw video recordings and learn the relevant patterns itself.

Machine Learning Algorithms vs Traditional Logic Code

Traditional if/else logic is static and must be rewritten when data patterns change, while machine‑learning models adapt their parameters automatically to new data without code modifications.

Old ML algorithms need hand‑crafted features, are lightweight, and work well on small datasets (e.g., license‑plate recognition). Modern deep‑learning models learn features autonomously, handle large‑scale tasks, and require more compute.

Introduction to sklearn

The sklearn (scikit‑learn) library provides utilities for data preprocessing, feature engineering, model selection, and evaluation, including many classic algorithms such as clustering, regression, and ensemble methods.

Clustering algorithms automatically group data without supervision; examples include spectral clustering, density‑based clustering, K‑means, and hierarchical clustering.

Practical: Voting Regressor for Disease Progress Prediction

Data Collection and Description

A public diabetes dataset contains 442 samples with 10 input features (AGE, SEX, BMI, BP, S1‑S6) and one target variable (1‑year blood‑glucose level).

Index

Name

Example

1

AGE

59

2

SEX

2

3

BMI

32.1

4

BP

101

5

S1

157

6

S2

93.2

7

S3

38

8

S4

4

9

S5

4.8598

10

S6

87

Output

Y

151

Data Preprocessing

Features are mean‑centered and scaled by standard deviation to make them comparable across different units.

import numpy as np
datasets = np.loadtxt("diabetes.txt")
X = datasets[:, :10]  # inputs
y = datasets[:, -1]   # output
X_mean = np.mean(X, axis=0)
X_centered = X - X_mean
std = np.std(X_centered, axis=0)
scaled_X = X_centered * std
scale = np.sqrt(np.sum(scaled_X**2, axis=0))
scaled_X = scaled_X / scale

Model Training

A RandomForestRegressor from sklearn is trained on the scaled data.

from sklearn.ensemble import RandomForestRegressor
reg_rf = RandomForestRegressor(random_state=1)
reg_rf.fit(scaled_X, y)

Prediction

For a new patient, the same preprocessing is applied before calling predict :

x_new = [[49, 1, 31.1, 110, 154, 95.2, 33, 4, 4.6692, 97]]
scaled_x_new = (x_new - X_mean) * std / scale
pred_rf = reg_rf.predict(scaled_x_new)
# pred_rf => array([213.26])

Voting Regressor

Multiple regressors (GradientBoostingRegressor, RandomForestRegressor, LinearRegression) are combined using VotingRegressor to improve robustness.

from sklearn.ensemble import GradientBoostingRegressor, VotingRegressor
from sklearn.linear_model import LinearRegression
reg1 = GradientBoostingRegressor(random_state=1)
reg1.fit(scaled_X, y)
reg2 = RandomForestRegressor(random_state=1)
reg2.fit(scaled_X, y)
reg3 = LinearRegression()
reg3.fit(scaled_X, y)
reg = VotingRegressor([('gb', reg1), ('rf', reg2), ('lr', reg3)])
reg.fit(scaled_X, y)
pred = reg.predict(scaled_x_new)

Conclusion

Traditional machine‑learning algorithms are lightweight AI solutions suitable for tasks with relatively stable features; they require domain expertise for feature engineering. Deep learning excels when raw data is abundant and feature extraction is complex, but both approaches have their place.

Predicting lottery numbers with machine learning is infeasible because the draws are purely random and contain no deterministic pattern.

machine learningPythonDeep Learningregressionsklearn
Rare Earth Juejin Tech Community
Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.