Artificial Intelligence 12 min read

Overview of Common Machine Learning Models: Characteristics, Advantages, and Disadvantages

This article provides a concise overview of fifteen widely used machine learning models—including decision trees, random forests, k‑means, KNN, EM, linear and logistic regression, Naive Bayes, Apriori, Boosting, GBDT, SVM, neural networks, HMM, and CRF—detailing their features, strengths, weaknesses, and typical application scenarios.

Architecture Digest
Architecture Digest
Architecture Digest
Overview of Common Machine Learning Models: Characteristics, Advantages, and Disadvantages

Decision Tree

Features

Suitable for small datasets; uses hierarchical variables or decision nodes to classify instances such as credit‑reliable vs. unreliable users.

Example Scenarios

Rule‑based credit assessment, horse‑race outcome prediction.

Advantages

Simple computation, strong interpretability, handles missing attributes, tolerates irrelevant features, and evaluates diverse characteristics effectively.

Disadvantages

Prone to over‑fitting (mitigated by pruning or ensemble methods like Random Forest).

Applicable Data Types

Both numeric and nominal attributes.

CART Classification and Regression Trees

Uses Gini index for split decisions; classification trees for categorical targets, regression trees for continuous targets.

Advantages

Highly flexible, supports misclassification costs, can incorporate prior probabilities, automatic cost‑complexity pruning, produces easy‑to‑understand rules, and achieves high accuracy.

Disadvantages

Requires multiple scans and sorting of the dataset, making it inefficient; C4.5 variant only works on data that fits in memory.

Random Forest

Features

Accuracy comparable to AdaBoost; robust to errors and outliers; performance depends on strength and correlation of base classifiers; sensitive to the number of attributes considered at each split (commonly log₂(n)+1).

Example Scenarios

User churn analysis, risk assessment.

Advantages

Less prone to over‑fitting, fast on large databases, handles missing values well, provides intrinsic variable‑importance estimates, balances errors on imbalanced data, and offers proximity measures useful for outlier detection and visualization.

Disadvantages

May over‑fit noisy classification/regression problems; attribute‑level importance can be unreliable when attributes have differing levels.

k‑means Clustering

Features

May not reach global optimum; results depend on initial centroid selection; uses mean‑square error as the dispersion metric.

Advantages

Simple and fast; computational complexity O(n k t) where n is sample size, k is number of clusters, and t is iterations.

Disadvantages

Requires numeric centroids, sensitive to the choice of k and initial points, unsuitable for clusters of vastly different sizes, and highly sensitive to noise and outliers.

K‑Nearest Neighbors (KNN)

Features

No explicit training phase; prediction by majority vote; key elements are k value, distance metric, and decision rule.

Advantages

Simple, works for classification and regression, handles non‑linear boundaries, O(n) prediction cost, and relatively robust to outliers.

Disadvantages

Requires pre‑defining k; can be biased toward majority classes in imbalanced datasets.

Common Algorithm

kd‑tree for efficient neighbor search.

Expectation‑Maximization (EM)

Features

E‑step computes the expected value of hidden variables given current parameters; M‑step maximizes the expected likelihood to update parameters.

Advantages

More stable and accurate than k‑means.

Disadvantages

Computationally intensive, slow convergence, and sensitive to initial parameter guesses.

Linear Regression

Features

Provides an analytical solution.

Advantages

Simple and has a closed‑form solution.

Disadvantages

Poor fit for complex data; tends to under‑fit.

Logistic Regression

Features

Based on the logistic distribution; optimization methods include iterative scaling, gradient descent, and quasi‑Newton approaches.

Advantages

Simple, low computational cost, and minimal storage requirements.

Disadvantages

Prone to under‑fitting and limited accuracy.

Naive Bayes

Features

Uses prior knowledge to compute posterior probabilities; assumes conditional independence of features.

Example Scenarios

Sentiment analysis, consumer segmentation.

Advantages

Performs well on small datasets, supports multi‑class problems, and offers fast classification.

Disadvantages

Independence assumption can reduce accuracy; performance not guaranteed to be high.

Apriori

Features

Two‑stage frequent‑itemset generation; iteratively removes items with support below a threshold.

Advantages

Easy to implement.

Disadvantages

Slow on large datasets, generates many candidate sets, and incurs high I/O due to repeated full‑dataset scans.

Boosting

Features

Adjusts sample weights during learning and combines multiple classifiers based on performance.

Advantages

Low generalization error, high classification accuracy, and requires few hyper‑parameters.

Disadvantages

Sensitive to outliers.

Gradient Boosted Decision Trees (GBDT / MART)

Features

Two variants: residual version and gradient version (the latter is more widely used).

Advantages

Handles both linear and non‑linear relationships and is less prone to over‑fitting.

Support Vector Machine (SVM)

Features

Maps data from low‑dimensional to high‑dimensional space to achieve linear separability.

Advantages

Enables non‑linear classification and regression, yields low generalization error, and is relatively interpretable.

Disadvantages

Sensitive to choice of kernel function and hyper‑parameters.

Neural Networks

Features

Inspired by brain structure; composed of artificial neurons.

Advantages

Back‑Propagation (BP) networks provide strong non‑linear fitting, simple learning rules, robustness, memory and self‑learning capabilities, and good parallelism; Radial Basis Function (RBF) networks offer optimal approximation, no local minima, fast convergence, and excellent classification performance.

Disadvantages

Models are not interpretable, require sufficient data, and are sensitive to initialization.

Hidden Markov Model (HMM)

Features

Double stochastic process consisting of a hidden Markov chain and observable output functions.

Example Scenarios

Facial expression analysis, weather forecasting.

Advantages

Solves sequence labeling problems.

Disadvantages

Relies on homogeneous Markov and observation independence assumptions, which may introduce bias.

Conditional Random Field (CRF)

Features

Discriminative, undirected graphical model.

Advantages

Globally normalized probability avoids label‑bias, flexible feature design, and does not require strict independence assumptions.

Disadvantages

High training cost and computational complexity.

Machine LearningclusteringNeural Networksregressionclassificationensemble methodsprobabilistic models
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.