An Overview of Machine Learning and Deep Learning: Definitions, Concepts, and Core Techniques
This article provides a comprehensive introduction to machine learning and deep learning, covering their definitions, classifications, key algorithms, neural network structures, core concepts such as generalization and regularization, and typical architectures like CNN and RNN, illustrated with numerous diagrams.
2016 was hailed as the "Year of Artificial Intelligence," and 2017 marked the "Year of Intelligent Applications." Rapid advances in deep learning have led to widespread use in online education. This article introduces the definitions and basic concepts of machine learning and deep learning, as well as related network structures.
Introduction
What can artificial intelligence actually do, and how does it affect us? Two images illustrate the analogy: the first shows the industrial revolution freeing humans from physical labor, while the second shows AI freeing humans from repetitive mental tasks, thereby boosting productivity.
Figure 1
Figure 2
About Machine Learning
Machine learning enables computers to discover patterns from large historical datasets using statistical algorithms, producing models that can recognize new samples or make predictions, thereby guiding business decisions. Formally, it is the study of giving computers abilities without explicit programming. Machine learning consists of data, algorithms, and models; data plus algorithms generate a model that provides intelligent services.
Figure 3
Learning Methods
1. Supervised Learning
Training data are labeled with explicit outputs (labels). Models are iteratively adjusted by comparing predictions with true labels until a desired accuracy is reached. Typical tasks include classification and regression, using algorithms such as Logistic Regression and Back‑Propagation Neural Networks.
2. Unsupervised Learning
Data are unlabeled; the model seeks intrinsic structures, supporting tasks like clustering and association rule mining (e.g., Apriori, K‑Means).
3. Semi‑Supervised Learning
Only a portion of the data is labeled. Models first learn the underlying structure from all data, then refine predictions on the labeled subset, using methods such as Graph Inference and Laplacian SVM.
4. Reinforcement Learning
Feedback is provided as rewards rather than explicit labels, guiding the model to adjust actions in dynamic systems or robot control, with algorithms like Q‑Learning and Temporal‑Difference learning.
Algorithm Similarity
1. Regression Algorithms
Methods that model relationships by minimizing error, including Ordinary Least Squares, Logistic Regression, Stepwise Regression, MARS, and LOESS.
2. Instance‑Based Algorithms
Techniques that compare new samples to stored instances, such as k‑Nearest Neighbors, Learning Vector Quantization, and Self‑Organizing Maps.
3. Regularization Methods
Extensions of regression that penalize model complexity to avoid over‑fitting, e.g., Ridge Regression, LASSO, and Elastic Net.
4. Decision‑Tree Learning
Tree‑structured models for classification and regression, including CART, ID3, C4.5, CHAID, Random Forest, MARS, and Gradient Boosting Machines.
5. Kernel‑Based Methods
Algorithms that map inputs into high‑dimensional spaces, most notably Support Vector Machines, Radial Basis Function kernels, and Linear Discriminant Analysis.
6. Clustering Algorithms
Methods that group data by similarity, such as k‑Means and Expectation‑Maximization.
7. Association‑Rule Learning
Techniques that discover useful relationships in large datasets, e.g., Apriori and Eclat.
8. Dimensionality‑Reduction Algorithms
Unsupervised methods that reveal intrinsic structure while reducing dimensionality, including PCA, PLS, Sammon mapping, MDS, and Projection Pursuit.
9. Ensemble Methods
Strategies that combine multiple weak learners to improve overall prediction, such as Boosting, Bagging, AdaBoost, Stacked Generalization, Gradient Boosting, and Random Forest.
Machine‑learning classification and practice roadmap (Figure 4):
About Deep Learning
Deep learning is a specialized form of machine learning that represents the world as a hierarchy of nested concepts, from simple to abstract. Figure 5 illustrates this hierarchical representation.
Figure 5
Deep‑learning models decompose a complex mapping from raw inputs (e.g., pixel values) to high‑level concepts into a series of simpler mappings across visible and hidden layers (Figure 6). Each hidden layer extracts increasingly abstract features, enabling the network to recognize edges, corners, contours, and ultimately whole objects.
Figure 6
Differences Between Deep Learning and Machine Learning
Data scale: Deep learning requires massive datasets; performance improves with more labeled samples, often needing millions of examples to surpass human-level accuracy.
Feature handling: Deep learning automatically learns feature representations, whereas traditional machine learning relies on manually engineered features.
Figure 7 (Venn diagram showing deep learning as both representation learning and machine learning)
Figure 8 (AI system components and their relationships)
Neural Network Basic Concepts and Structure
A simple neural network (Figure 9) consists of input, hidden, and output layers. Neurons store scalar values; connections have weights that are learned via back‑propagation.
Figure 9
Core Concepts of Deep Learning
1. Generalization
Generalization is the ability of a model to perform well on previously unseen inputs. It is a central challenge in machine learning.
2. Basic Assumptions
Smoothness and local constancy priors assume that functions should not vary dramatically over small regions.
Manifold learning assumes high‑dimensional data lie on low‑dimensional manifolds, which can be mapped to reveal intrinsic structure.
3. Representation
Effective representations (low‑dimensional, sparse, or independent) are crucial for model performance. Figure 10 shows examples of different data representations.
Figure 10
4. Error, Overfitting, Underfitting, Capacity
Training error vs. generalization error.
Overfitting: large gap between training and test error.
Underfitting: model cannot achieve low training error.
Capacity: ability of a model to fit various functions.
Figure 11 illustrates the typical relationship between capacity, training error, and test error.
5. Optimization, Regularization, Hyper‑parameters
Training seeks parameters that minimize a loss function, which in deep learning is often non‑convex, leading to many local minima. Mini‑batch gradient descent with a suitable learning rate is commonly used. Regularization techniques (L1, L2, Dropout) mitigate overfitting, while hyper‑parameters such as learning rate and regularization strength are tuned through experimentation.
Figure 12 (training loss curve)
6. Convolutional Neural Network (CNN)
CNNs are popular for data with spatial relationships, such as images. They consist of convolutional layers that apply learnable filters and pooling layers that down‑sample feature maps.
Figure 13 (convolution operation)
Figure 14 (max‑pooling operation)
7. Recurrent Neural Network (RNN)
RNNs are suited for sequential data such as text and speech. Cells share parameters across time steps, maintaining a hidden state that captures temporal dependencies (Figure 15).
Figure 15
When unfolded, the RNN forms a computation graph of fixed‑length units (Figure 16).
Figure 16
Deep Learning Insights
With frameworks like TensorFlow and Caffe, the cost of building and deploying machine‑learning models has dropped dramatically, shifting engineers' focus from algorithmic details to business‑oriented model design. However, production ML systems require extensive supporting infrastructure (Figure 18).
Figure 17 (illustration of deep‑learning model construction)
Figure 18 (infrastructure surrounding real‑world ML systems)
About the Author
Hujiang Intelligent Learning Lab (HILL) was founded in 2017 to integrate education, psychology, and computer science, exploring AI applications in education and advancing intelligent capabilities for Hujiang products and partners. Vision: Activate Intelligence, Innovate Learning.
Hujiang Technology
We focus on the real-world challenges developers face, delivering authentic, practical content and a direct platform for technical networking among developers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
