Artificial Intelligence 22 min read

An Overview of Machine Learning and Deep Learning: Definitions, Concepts, and Core Techniques

This article provides a comprehensive introduction to machine learning and deep learning, covering their definitions, classifications, key algorithms, neural network structures, core concepts such as generalization and regularization, and typical architectures like CNN and RNN, illustrated with numerous diagrams.

Hujiang Technology

Oct 12, 2017

An Overview of Machine Learning and Deep Learning: Definitions, Concepts, and Core Techniques

2016 was hailed as the "Year of Artificial Intelligence," and 2017 marked the "Year of Intelligent Applications." Rapid advances in deep learning have led to widespread use in online education. This article introduces the definitions and basic concepts of machine learning and deep learning, as well as related network structures.

Introduction

What can artificial intelligence actually do, and how does it affect us? Two images illustrate the analogy: the first shows the industrial revolution freeing humans from physical labor, while the second shows AI freeing humans from repetitive mental tasks, thereby boosting productivity.

Figure 1

Figure 2

About Machine Learning

Machine learning enables computers to discover patterns from large historical datasets using statistical algorithms, producing models that can recognize new samples or make predictions, thereby guiding business decisions. Formally, it is the study of giving computers abilities without explicit programming. Machine learning consists of data, algorithms, and models; data plus algorithms generate a model that provides intelligent services.

Figure 3

Learning Methods

1. Supervised Learning

Training data are labeled with explicit outputs (labels). Models are iteratively adjusted by comparing predictions with true labels until a desired accuracy is reached. Typical tasks include classification and regression, using algorithms such as Logistic Regression and Back‑Propagation Neural Networks.

2. Unsupervised Learning

Data are unlabeled; the model seeks intrinsic structures, supporting tasks like clustering and association rule mining (e.g., Apriori, K‑Means).

3. Semi‑Supervised Learning

Only a portion of the data is labeled. Models first learn the underlying structure from all data, then refine predictions on the labeled subset, using methods such as Graph Inference and Laplacian SVM.

4. Reinforcement Learning

Feedback is provided as rewards rather than explicit labels, guiding the model to adjust actions in dynamic systems or robot control, with algorithms like Q‑Learning and Temporal‑Difference learning.

Algorithm Similarity

1. Regression Algorithms

Methods that model relationships by minimizing error, including Ordinary Least Squares, Logistic Regression, Stepwise Regression, MARS, and LOESS.

2. Instance‑Based Algorithms

Techniques that compare new samples to stored instances, such as k‑Nearest Neighbors, Learning Vector Quantization, and Self‑Organizing Maps.

3. Regularization Methods

Extensions of regression that penalize model complexity to avoid over‑fitting, e.g., Ridge Regression, LASSO, and Elastic Net.

4. Decision‑Tree Learning

Tree‑structured models for classification and regression, including CART, ID3, C4.5, CHAID, Random Forest, MARS, and Gradient Boosting Machines.

5. Kernel‑Based Methods

Algorithms that map inputs into high‑dimensional spaces, most notably Support Vector Machines, Radial Basis Function kernels, and Linear Discriminant Analysis.

6. Clustering Algorithms

Methods that group data by similarity, such as k‑Means and Expectation‑Maximization.

7. Association‑Rule Learning

Techniques that discover useful relationships in large datasets, e.g., Apriori and Eclat.

8. Dimensionality‑Reduction Algorithms

Unsupervised methods that reveal intrinsic structure while reducing dimensionality, including PCA, PLS, Sammon mapping, MDS, and Projection Pursuit.

9. Ensemble Methods

Strategies that combine multiple weak learners to improve overall prediction, such as Boosting, Bagging, AdaBoost, Stacked Generalization, Gradient Boosting, and Random Forest.

Machine‑learning classification and practice roadmap (Figure 4):

About Deep Learning

Deep learning is a specialized form of machine learning that represents the world as a hierarchy of nested concepts, from simple to abstract. Figure 5 illustrates this hierarchical representation.

Figure 5

Deep‑learning models decompose a complex mapping from raw inputs (e.g., pixel values) to high‑level concepts into a series of simpler mappings across visible and hidden layers (Figure 6). Each hidden layer extracts increasingly abstract features, enabling the network to recognize edges, corners, contours, and ultimately whole objects.

Figure 6

Differences Between Deep Learning and Machine Learning

Data scale: Deep learning requires massive datasets; performance improves with more labeled samples, often needing millions of examples to surpass human-level accuracy.

Feature handling: Deep learning automatically learns feature representations, whereas traditional machine learning relies on manually engineered features.

Figure 7 (Venn diagram showing deep learning as both representation learning and machine learning)

Figure 8 (AI system components and their relationships)

Neural Network Basic Concepts and Structure

A simple neural network (Figure 9) consists of input, hidden, and output layers. Neurons store scalar values; connections have weights that are learned via back‑propagation.

Figure 9

Core Concepts of Deep Learning

1. Generalization

Generalization is the ability of a model to perform well on previously unseen inputs. It is a central challenge in machine learning.

2. Basic Assumptions

Smoothness and local constancy priors assume that functions should not vary dramatically over small regions.

Manifold learning assumes high‑dimensional data lie on low‑dimensional manifolds, which can be mapped to reveal intrinsic structure.

3. Representation

Effective representations (low‑dimensional, sparse, or independent) are crucial for model performance. Figure 10 shows examples of different data representations.

Figure 10

4. Error, Overfitting, Underfitting, Capacity

Training error vs. generalization error.

Overfitting: large gap between training and test error.

Underfitting: model cannot achieve low training error.

Capacity: ability of a model to fit various functions.

Figure 11 illustrates the typical relationship between capacity, training error, and test error.

5. Optimization, Regularization, Hyper‑parameters

Training seeks parameters that minimize a loss function, which in deep learning is often non‑convex, leading to many local minima. Mini‑batch gradient descent with a suitable learning rate is commonly used. Regularization techniques (L1, L2, Dropout) mitigate overfitting, while hyper‑parameters such as learning rate and regularization strength are tuned through experimentation.

Figure 12 (training loss curve)

6. Convolutional Neural Network (CNN)

CNNs are popular for data with spatial relationships, such as images. They consist of convolutional layers that apply learnable filters and pooling layers that down‑sample feature maps.

Figure 13 (convolution operation)

Figure 14 (max‑pooling operation)

7. Recurrent Neural Network (RNN)

RNNs are suited for sequential data such as text and speech. Cells share parameters across time steps, maintaining a hidden state that captures temporal dependencies (Figure 15).

Figure 15

When unfolded, the RNN forms a computation graph of fixed‑length units (Figure 16).

Figure 16

Deep Learning Insights

With frameworks like TensorFlow and Caffe, the cost of building and deploying machine‑learning models has dropped dramatically, shifting engineers' focus from algorithmic details to business‑oriented model design. However, production ML systems require extensive supporting infrastructure (Figure 18).

Figure 17 (illustration of deep‑learning model construction)

Figure 18 (infrastructure surrounding real‑world ML systems)

About the Author

Hujiang Intelligent Learning Lab (HILL) was founded in 2017 to integrate education, psychology, and computer science, exploring AI applications in education and advancing intelligent capabilities for Hujiang products and partners. Vision: Activate Intelligence, Innovate Learning.

CNN machine learning neural networks RNN

Written by

Hujiang Technology

We focus on the real-world challenges developers face, delivering authentic, practical content and a direct platform for technical networking among developers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.