Introduction to Machine Learning: Concepts, Terminology, Algorithms, Evaluation Metrics, and Practical Code Examples
This article provides a comprehensive overview of machine learning, covering fundamental concepts, key terminology, common algorithms for supervised, unsupervised, and reinforcement learning, model evaluation metrics, loss functions, and practical code examples such as random forest and SVM implementations.
Introduction to Machine Learning
Machine Learning (ML) is the process of enabling computers to learn patterns from data; supervised learning uses labeled data while unsupervised learning works with unlabeled data.
Fundamental Concepts
The core idea is to search for a function that best fits the data by designing a model, evaluating its performance, and optimizing it.
Key Terminology
Label (label) : The target variable we aim to predict.
Feature (feature) : Input variables used for prediction.
Sample (example) : A single row in a dataset, which can be labeled or unlabeled.
Model (model) : Defines the relationship between features and labels.
Training : Learning the model from labeled samples.
Inference : Applying a trained model to unlabeled samples.
Training set, Validation set, Test set : Subsets of data used for training, hyper‑parameter tuning, and final evaluation respectively.
Bias : Measures how far the model’s average prediction deviates from true values.
Variance : Measures how predictions vary with different training sets.
Generalization : The ability of a model to perform well on unseen data.
Overfitting : Model fits training data too closely, harming generalization.
Underfitting : Model fails to capture underlying patterns.
Common Machine‑Learning Algorithms
Algorithms are grouped by learning paradigm.
Supervised Learning
Includes classification and regression. Representative algorithms: Decision Tree, Random Forest, SVM, Logistic Regression, K‑Nearest Neighbors, Naïve Bayes, Neural Networks, AdaBoost, XGBoost.
Unsupervised Learning
Discovers structure without labels. Representative algorithms: PCA, K‑Means, hierarchical clustering, association‑rule mining (Apriori).
Reinforcement Learning
Agents learn actions that maximize cumulative reward through trial‑and‑error.
Algorithm Details
Decision Tree : Easy to interpret, fast, but prone to overfitting.
Random Forest : Ensemble of decision trees; reduces overfitting, handles high‑dimensional data.
SVM : Finds optimal hyper‑plane; effective on small‑sample, high‑dimensional data.
K‑Nearest Neighbors : Simple, non‑parametric; high computational cost at inference.
Logistic Regression : Fast, interpretable; requires feature scaling.
Neural Network : High accuracy, parallelizable; requires large data and careful tuning.
AdaBoost : Boosts weak learners; sensitive to outliers.
Model Evaluation Metrics
Classification metrics: Accuracy, Error Rate, Sensitivity (Recall), Specificity, Precision, F1‑Score, ROC‑AUC, PR‑AUC.
Regression metrics: MAE, MSE, RMSE, MAPE, R².
Loss Functions (PyTorch Examples)
L1 Loss:
torch.nn.L1Loss(reduction='mean')MSE Loss:
torch.nn.MSELoss(reduction='mean')Cross‑Entropy Loss:
torch.nn.CrossEntropyLoss(weight=None, ignore_index=-100, reduction='mean')KL‑Divergence Loss:
torch.nn.KLDivLoss(reduction='mean')Binary Cross‑Entropy Loss:
torch.nn.BCELoss(weight=None, reduction='mean')BCE with Logits Loss:
torch.nn.BCEWithLogitsLoss(weight=None, reduction='mean', pos_weight=None)Practical Example: Iris Classification with SVM
Data preparation using NumPy and scikit‑learn, model training, evaluation, and visualization steps are demonstrated.
# Load data
import numpy as np
from sklearn import model_selection
from sklearn.svm import SVC
# Split dataset
x_train, x_test, y_train, y_test = model_selection.train_test_split(x, y, random_state=1, test_size=0.3)
# Train SVM
clf = SVC()
clf.fit(x_train, y_train.ravel())
# Evaluate
print('Training accuracy: %.3f' % clf.score(x_train, y_train))
print('Test accuracy: %.3f' % clf.score(x_test, y_test))Summary Tables
Algorithm
Advantages
Disadvantages
Bayes
Few parameters, robust to missing data
Assumes feature independence, needs prior probabilities
Decision Tree
No domain knowledge needed, handles high‑dimensional data, easy to understand
Bias towards features with many values, prone to overfitting, no online learning
SVM
Works on small samples, good generalization, handles high‑dimensional non‑linear data
Sensitive to missing data, high memory consumption, complex tuning
KNN
Simple, works for classification & regression, no strong assumptions
Computationally intensive, sensitive to class imbalance
Logistic Regression
Fast, interpretable, easy to update
Requires feature engineering and scaling
Neural Network
High accuracy, parallelizable, robust
Many parameters, hard to interpret, long training time
AdaBoost
High precision, combines weak learners, simple
Sensitive to outliers
The article concludes with a mind‑map of machine‑learning concepts and links to additional resources.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.