Artificial Intelligence 23 min read

Comprehensive Overview of Common Anomaly Detection Methods with Code Examples

This article compiles and explains a variety of common anomaly detection techniques—including distribution‑based, distance‑based, density‑based, clustering, tree‑based, dimensionality‑reduction, classification, and prediction methods—providing algorithm descriptions, workflow steps, advantages, limitations, and ready‑to‑run Python code snippets for each approach.

Python Programming Learning Circle

May 10, 2024

Comprehensive Overview of Common Anomaly Detection Methods with Code Examples

The article gathers a broad set of anomaly detection algorithms frequently used in data analysis and machine‑learning tasks. Each method is introduced with its theoretical basis, typical workflow, strengths, and drawbacks, followed by concise Python implementations.

1. Distribution‑Based Methods

3‑Sigma : assumes normal distribution; points beyond three standard deviations are outliers. Example implementation:

def three_sigma(s):
    mu, std = np.mean(s), np.std(s)
    lower, upper = mu - 3*std, mu + 3*std
    return lower, upper

Z‑Score : standard score of each point; threshold of 3 mirrors the 3‑sigma rule.

def z_score(s):
    return (s - np.mean(s)) / np.std(s)

Boxplot (IQR) : uses inter‑quartile range to define lower and upper bounds.

def boxplot(s):
    q1, q3 = s.quantile(0.25), s.quantile(0.75)
    iqr = q3 - q1
    lower, upper = q1 - 1.5*iqr, q3 + 1.5*iqr
    return lower, upper

Grubbs’ Test : hypothesis test for a single outlier in a normally‑distributed sample. Steps include sorting, computing mean/std, evaluating the most extreme value against a critical value.

from outliers import smirnov_grubbs as grubbs
print(grubbs.test([8,9,10,1,9], alpha=0.05))

2. Distance‑Based Methods

K‑Nearest Neighbors (KNN) : average distance to the K nearest points; large distance indicates an outlier.

from pyod.models.knn import KNN
clf = KNN(method='mean', n_neighbors=3)
clf.fit(X_train)
y_train_pred = clf.labels_

3. Density‑Based Methods

Local Outlier Factor (LOF) : compares local density of a point to that of its neighbors.

from sklearn.neighbors import LocalOutlierFactor as LOF
clf = LOF(n_neighbors=2)
res = clf.fit_predict(X)

Connectivity‑Based Outlier Factor (COF) : similar to LOF but uses average chain distance; implemented via pyod.models.cof.

4. Clustering‑Based Methods

DBSCAN : points not belonging to any dense cluster are labeled as noise (outliers).

from sklearn.cluster import DBSCAN
clustering = DBSCAN(eps=3, min_samples=2).fit(X)
labels = clustering.labels_

5. Tree‑Based Methods

Isolation Forest : builds random trees; points isolated with short path lengths are considered anomalies.

from sklearn.ensemble import IsolationForest
iforest = IsolationForest(n_estimators=100, contamination=0.05)
iforest.fit(X)
labels = iforest.predict(X)

6. Dimensionality‑Reduction Methods

Principal Component Analysis (PCA) : evaluates reconstruction error or deviation along principal components to flag outliers.

from sklearn.decomposition import PCA
pca = PCA()
transformed = pca.fit_transform(X)
# compute anomaly scores using eigenvalues

AutoEncoder : trains a neural network to reconstruct normal data; high reconstruction error signals an anomaly.

import tensorflow as tf
model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(2, activation='relu'),
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(input_dim)
])
model.compile(loss='mse', optimizer='adam')
model.fit(X_train, X_train, epochs=100, batch_size=10)
recon_error = np.mean(np.abs(model.predict(X_test) - X_test), axis=1)

7. Classification‑Based Methods

One‑Class SVM : learns a boundary that encloses the majority of data; points outside are outliers.

from sklearn import svm
clf = svm.OneClassSVM(nu=0.1, kernel='rbf', gamma=0.1)
clf.fit(X)
labels = clf.predict(X)

8. Prediction‑Based Methods

For time‑series, predict future values, compute residuals, and apply statistical thresholds (e.g., K‑sigma) to detect anomalies.

Conclusion

The survey categorizes anomaly detection techniques into distribution, distance, density, clustering, tree, dimensionality‑reduction, classification, and prediction families, highlighting their algorithms, typical use‑cases, pros and cons, and providing ready‑to‑run Python code for each.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Anomaly Detection unsupervised learning outlier detection

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.