Unlocking Anomaly Detection: Techniques from Time Series to Deep Learning
This comprehensive guide explores anomaly (outlier) detection across diverse methods—including time‑series analysis, statistical tests, distance metrics, matrix factorization, graph approaches, behavior‑sequence modeling, and supervised machine‑learning models—highlighting their principles, formulas, and practical use cases such as fraud prevention and system monitoring.
Background
Outlier detection (also called anomaly detection) aims to identify data points that deviate significantly from expected behavior. It is widely used in credit‑card fraud, industrial fault detection, ad click fraud prevention, and many other real‑world scenarios.
1. Time Series
1.1 Moving Average (MA)
Moving average smooths high‑frequency noise and detects anomalies. Common variants include Simple Moving Average (SMA), Weighted Moving Average (WMA), and Exponential Moving Average (EMA).
1.1.1 Simple Moving Average (SMA)
Uses the mean of historical values as the forecast; deviations beyond a threshold indicate anomalies. Suitable for smoothing noisy data and short‑term forecasting.
1.1.2 Weighted Moving Average (WMA)
Assigns higher weights to recent observations, reducing lag compared to SMA while still retaining some linear decay.
1.1.3 Exponential Moving Average (EMA)
Applies exponential decay to past observations, ensuring older data never receive zero weight. The smoothing factor α (0 < α < 1) controls decay speed.
1.2 Year‑over‑Year (YoY) and Month‑over‑Month (MoM)
Comparing current metrics with historical cycles helps detect abnormal spikes or drops in periodic data such as DAU or ad spend.
1.3 STL + GESD
STL decomposes a series into seasonal, trend, and residual components; GESD (Generalized Extreme Studentized Deviate) then flags extreme residuals, using robust statistics (median, MAD) instead of mean and std.
2. Statistical Methods
2.1 Single Gaussian Feature
If a feature follows a normal distribution, its probability density function can be used to compute anomaly scores.
2.2 Multiple Independent Gaussian Features
Assuming each of n independent features is Gaussian, compute mean and variance per dimension and evaluate the joint probability of new samples.
2.3 Multivariate Gaussian
For correlated features, estimate the mean vector and covariance matrix; Mahalanobis distance measures deviation from the multivariate normal.
2.4 Mahalanobis Distance
Calculates the distance of a point from the mean, scaled by the covariance matrix; large values indicate outliers.
2.5 Boxplot Method
Computes Q1, Q3 and IQR; points outside Q1‑1.5·IQR or Q3+1.5·IQR are considered anomalies, applicable when data are non‑Gaussian.
3. Distance‑Based Methods
3.1 Angle‑Based Detection
Analyzes variance of angles formed by a point with all other point pairs; low variance suggests the point is isolated.
3.2 K‑Nearest Neighbors (KNN)
Sum of distances to the K nearest neighbors; larger sums indicate higher anomaly likelihood.
4. Linear Methods (Matrix Factorization & PCA)
PCA projects data onto principal components; reconstruction error is small for normal points but large for anomalies, especially on lower‑variance components.
5. Distribution Comparison
5.1 Kullback‑Leibler (KL) Divergence
Measures the distance between two probability distributions; larger KL indicates greater dissimilarity.
5.2 Chi‑Square Test
Compares observed frequencies with expected frequencies to assess deviation significance.
6. Tree‑Based Method (Isolation Forest)
Randomly partitions data space; points isolated with fewer splits are deemed anomalous. Visualized by isolation depth.
7. Graph‑Based Methods
7.1 Largest Connected Component
Identifies groups of devices or users that are mutually connected, useful for detecting coordinated fraud.
7.2 Label Propagation Clustering
Iteratively propagates labels based on node similarity, yielding densely connected subgraphs.
8. Behavior Sequence (Markov Chain)
Models user actions as states (e.g., page request, search, click) and computes transition probabilities; low‑probability sequences flag abnormal behavior.
9. Supervised Models
9.1 Gradient Boosted Decision Trees (GBDT)
Trains on labeled anomalies (often generated by unsupervised methods or synthetic oversampling) and evaluates via conversion metrics.
9.2 Wide & Deep
Combines wide (linear) features with deep neural network embeddings to balance memorization and generalization for fraud detection.
10. Practical Considerations
10.1 Threshold Selection
Unsupervised methods use quantiles or distribution elbows; supervised models rely on precision‑recall curves.
10.2 Transforming Non‑Gaussian to Gaussian
Applies functions such as log, Box‑Cox, or Yeo‑Johnson to approximate normality before statistical testing.
References
[1] Charu C, Aggarwal, et al. Outlier Analysis, Second Edition, Springer, 2016.
[2] Varun Chandola, Arindam Banerjee, et al. Anomaly Detection: A Survey, ACM Computing Surveys, 2009.
[3] Kalyan Veeramachaneni, Ignacio Arnaldo, et al. AI2: Training a big data machine to defend, Proc. HPSC and IDS, 2016.
[4] Liu, Fei Tony, Kai‑Ming Ting, and Zhi‑Hua Zhou. Isolation Forest, ICDM, 2008.
[5] Cheng H T, Koc L, Harmsen J, et al. Wide & Deep Learning for Recommender Systems, ACM Computing Surveys, 2016.
[6] SMOTE: Synthetic Minority Over‑sampling Technique, JAIR, 2002.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
