Artificial Intelligence 17 min read

Unlocking Anomaly Detection: Techniques from Time Series to Deep Learning

This comprehensive guide explores anomaly (outlier) detection across diverse methods—including time‑series analysis, statistical tests, distance metrics, matrix factorization, graph approaches, behavior‑sequence modeling, and supervised machine‑learning models—highlighting their principles, formulas, and practical use cases such as fraud prevention and system monitoring.

Alibaba Cloud Developer

Mar 19, 2019

Unlocking Anomaly Detection: Techniques from Time Series to Deep Learning

Background

Outlier detection (also called anomaly detection) aims to identify data points that deviate significantly from expected behavior. It is widely used in credit‑card fraud, industrial fault detection, ad click fraud prevention, and many other real‑world scenarios.

1. Time Series

1.1 Moving Average (MA)

Moving average smooths high‑frequency noise and detects anomalies. Common variants include Simple Moving Average (SMA), Weighted Moving Average (WMA), and Exponential Moving Average (EMA).

1.1.1 Simple Moving Average (SMA)

Uses the mean of historical values as the forecast; deviations beyond a threshold indicate anomalies. Suitable for smoothing noisy data and short‑term forecasting.

1.1.2 Weighted Moving Average (WMA)

Assigns higher weights to recent observations, reducing lag compared to SMA while still retaining some linear decay.

1.1.3 Exponential Moving Average (EMA)

Applies exponential decay to past observations, ensuring older data never receive zero weight. The smoothing factor α (0 < α < 1) controls decay speed.

1.2 Year‑over‑Year (YoY) and Month‑over‑Month (MoM)

Comparing current metrics with historical cycles helps detect abnormal spikes or drops in periodic data such as DAU or ad spend.

1.3 STL + GESD

STL decomposes a series into seasonal, trend, and residual components; GESD (Generalized Extreme Studentized Deviate) then flags extreme residuals, using robust statistics (median, MAD) instead of mean and std.

2. Statistical Methods

2.1 Single Gaussian Feature

If a feature follows a normal distribution, its probability density function can be used to compute anomaly scores.

2.2 Multiple Independent Gaussian Features

Assuming each of n independent features is Gaussian, compute mean and variance per dimension and evaluate the joint probability of new samples.

2.3 Multivariate Gaussian

For correlated features, estimate the mean vector and covariance matrix; Mahalanobis distance measures deviation from the multivariate normal.

2.4 Mahalanobis Distance

Calculates the distance of a point from the mean, scaled by the covariance matrix; large values indicate outliers.

2.5 Boxplot Method

Computes Q1, Q3 and IQR; points outside Q1‑1.5·IQR or Q3+1.5·IQR are considered anomalies, applicable when data are non‑Gaussian.

3. Distance‑Based Methods

3.1 Angle‑Based Detection

Analyzes variance of angles formed by a point with all other point pairs; low variance suggests the point is isolated.

3.2 K‑Nearest Neighbors (KNN)

Sum of distances to the K nearest neighbors; larger sums indicate higher anomaly likelihood.

4. Linear Methods (Matrix Factorization & PCA)

PCA projects data onto principal components; reconstruction error is small for normal points but large for anomalies, especially on lower‑variance components.

5. Distribution Comparison

5.1 Kullback‑Leibler (KL) Divergence

Measures the distance between two probability distributions; larger KL indicates greater dissimilarity.

5.2 Chi‑Square Test

Compares observed frequencies with expected frequencies to assess deviation significance.

6. Tree‑Based Method (Isolation Forest)

Randomly partitions data space; points isolated with fewer splits are deemed anomalous. Visualized by isolation depth.

7. Graph‑Based Methods

7.1 Largest Connected Component

Identifies groups of devices or users that are mutually connected, useful for detecting coordinated fraud.

7.2 Label Propagation Clustering

Iteratively propagates labels based on node similarity, yielding densely connected subgraphs.

8. Behavior Sequence (Markov Chain)

Models user actions as states (e.g., page request, search, click) and computes transition probabilities; low‑probability sequences flag abnormal behavior.

9. Supervised Models

9.1 Gradient Boosted Decision Trees (GBDT)

Trains on labeled anomalies (often generated by unsupervised methods or synthetic oversampling) and evaluates via conversion metrics.

9.2 Wide & Deep

Combines wide (linear) features with deep neural network embeddings to balance memorization and generalization for fraud detection.

10. Practical Considerations

10.1 Threshold Selection

Unsupervised methods use quantiles or distribution elbows; supervised models rely on precision‑recall curves.

10.2 Transforming Non‑Gaussian to Gaussian

Applies functions such as log, Box‑Cox, or Yeo‑Johnson to approximate normality before statistical testing.

References

[1] Charu C, Aggarwal, et al. Outlier Analysis, Second Edition, Springer, 2016.

[2] Varun Chandola, Arindam Banerjee, et al. Anomaly Detection: A Survey, ACM Computing Surveys, 2009.

[3] Kalyan Veeramachaneni, Ignacio Arnaldo, et al. AI2: Training a big data machine to defend, Proc. HPSC and IDS, 2016.

[4] Liu, Fei Tony, Kai‑Ming Ting, and Zhi‑Hua Zhou. Isolation Forest, ICDM, 2008.

[5] Cheng H T, Koc L, Harmsen J, et al. Wide & Deep Learning for Recommender Systems, ACM Computing Surveys, 2016.

[6] SMOTE: Synthetic Minority Over‑sampling Technique, JAIR, 2002.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning deep learning Anomaly Detection statistical methods time series

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.