An Overview of Machine Learning and Deep Learning: Definitions, Core Concepts, and Typical Architectures
This article provides a comprehensive introduction to machine learning and deep learning, covering their definitions, differences, key concepts such as generalization, regularization, and overfitting, as well as typical algorithms and network architectures like CNN and RNN, illustrated with numerous diagrams.
Introduction
Artificial intelligence (AI) is often called the new industrial revolution: algorithms and computing power replace repetitive mental labor, freeing humans to focus on creative and scientific work and dramatically increasing productivity.
Figure 1 illustrates the impact of the industrial revolution; Figure 2 shows how AI replaces repetitive mental tasks, further boosting productivity.
About Machine Learning
Machine learning (ML) enables computers to discover patterns from large historical datasets using statistical algorithms, producing models that can recognize new samples or predict future outcomes without explicit programming. An ML system consists of data, algorithms, and models; data + algorithm → model → prediction or pattern recognition.
Figure 3 compares machine learning to cooking to aid understanding.
Learning Paradigms
ML algorithms can be classified by learning paradigm: supervised, unsupervised, semi‑supervised, and reinforcement learning.
1. Supervised Learning
Training data are labeled; models are iteratively adjusted until predictions reach a desired accuracy. Typical tasks include classification and regression, using algorithms such as Logistic Regression and Back‑Propagation Neural Networks.
2. Unsupervised Learning
Data are unlabeled; the goal is to infer intrinsic structure, e.g., clustering or association rule mining (Apriori, K‑Means).
3. Semi‑Supervised Learning
Only part of the data are labeled; models first learn the underlying structure and then leverage the labeled portion for prediction (e.g., Graph Inference, Laplacian SVM).
4. Reinforcement Learning
Feedback is provided as rewards rather than explicit labels; agents adjust policies in real time (e.g., Q‑Learning, Temporal‑Difference learning).
Algorithm Similarity
Algorithms are also grouped by functional similarity, such as tree‑based methods, neural‑network‑based methods, etc.
1. Regression Algorithms
Methods that model relationships between variables, including Ordinary Least Squares, Logistic Regression, Stepwise Regression, Multivariate Adaptive Regression Splines, and LOESS.
2. Instance‑Based Algorithms
Techniques that compare new samples to stored instances, such as K‑Nearest Neighbors, Learning Vector Quantization, and Self‑Organizing Maps.
3. Regularization Methods
Extensions of regression that control model complexity to avoid over‑fitting, e.g., Ridge Regression, LASSO, Elastic Net.
4. Decision‑Tree Learning
Tree‑structured models for classification and regression, including CART, ID3, C4.5, CHAID, Decision Stump, Random Forest, MARS, and Gradient Boosting Machines.
5. Kernel‑Based Methods
Algorithms that map data into high‑dimensional spaces, most notably Support Vector Machines, Radial Basis Function kernels, and Linear Discriminant Analysis.
6. Clustering Algorithms
Methods that group data by similarity, such as K‑Means and Expectation‑Maximization.
7. Association‑Rule Learning
Techniques that discover useful relationships in large datasets, e.g., Apriori and Eclat.
8. Dimensionality‑Reduction Algorithms
Approaches that uncover intrinsic structure while reducing dimensionality, including PCA, PLS, Sammon Mapping, MDS, and Projection Pursuit.
9. Ensemble Methods
Combine multiple weak learners to improve overall performance; examples are Boosting, Bagging, AdaBoost, Stacking, Gradient Boosting, and Random Forest.
Figure 4 shows a roadmap of ML classification and practice.
About Deep Learning
Deep learning (DL) is a specialized subset of ML that represents the world as a hierarchy of nested concepts, from simple to complex, enabling powerful and flexible modeling.
Figure 5 illustrates the hierarchical concept system of deep learning.
Figure 6 depicts a typical deep network: visible layers receive raw inputs (e.g., pixels), followed by successive hidden layers that extract increasingly abstract features, culminating in an output layer that performs the final task.
Differences Between Deep Learning and Machine Learning
Data scale: DL requires massive datasets (often millions of labeled samples) to achieve or surpass human performance.
Feature handling: DL automatically learns feature representations, whereas traditional ML relies on manually engineered features.
Figure 7 (Venn diagram) shows that deep learning is both representation learning and a form of machine learning.
Figure 8 illustrates how AI system components relate across different AI disciplines, highlighting the parts that learn from data.
Fundamental Concepts of Neural Networks
A neural network consists of layers of neurons connected by weighted edges and activation functions. The input layer receives data, hidden layers transform representations, and the output layer produces predictions. Training adjusts the weights via back‑propagation.
Core Deep‑Learning Concepts
1. Generalization
Generalization is the ability of a model to perform well on unseen data; it is the primary challenge in ML and DL.
2. Basic Assumptions
Smoothness prior and local constancy prior assume that the target function does not change abruptly in small neighborhoods.
Manifold learning assumes high‑dimensional data lie on a lower‑dimensional manifold that can be uncovered.
3. Representation
Good representations simplify learning; common forms include low‑dimensional, sparse, and independent representations (see Figure 10).
4. Error, Over‑fitting, Under‑fitting, Capacity
Training error vs. generalization error.
Over‑fitting: large gap between training and test error.
Under‑fitting: model cannot achieve low training error.
Capacity: ability of a model to fit various functions; Figure 11 shows the typical U‑shaped relationship.
5. Optimization, Regularization, Hyper‑parameters
Training seeks parameters that minimize a (often non‑convex) loss function, typically using mini‑batch gradient descent. Regularization (L1, L2, Dropout) mitigates over‑fitting, while hyper‑parameters such as learning rate and regularization weight are tuned manually or via search.
Convolutional Neural Networks (CNN)
CNNs are suited for data with spatial structure (e.g., images). They consist of convolutional layers that apply learnable filters and pooling layers that down‑sample feature maps.
Figure 13 illustrates convolution: a filter slides over the input image, performing element‑wise multiplication and summation to produce a feature map.
Figure 14 shows max‑pooling, which reduces spatial dimensions by retaining the maximum value within each region.
Recurrent Neural Networks (RNN)
RNNs handle sequential data such as text and speech. Units share parameters across time steps, passing a hidden state forward; Figure 15 depicts a single RNN cell.
During training, the network is often unrolled into a fixed‑length computation graph (Figure 16).
Reflections on Deep Learning
With frameworks like TensorFlow and Caffe, building and deploying ML/DL models has become much easier, shifting engineers' focus from algorithmic details to application design and accelerating development cycles.
Figure 17 visualizes the perception that constructing DL models is akin to stacking building blocks.
Figure 18 highlights that only a small fraction of a real‑world ML system is the ML code itself; the surrounding infrastructure is vast and complex.
Hujiang Technology
We focus on the real-world challenges developers face, delivering authentic, practical content and a direct platform for technical networking among developers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
