Databases 19 min read

AI-Powered Database Anomaly Detection Service: Feature Analysis, Algorithm Selection, and Real-Time Monitoring

The article details Meituan's database platform team's end‑to‑end design of an AI‑driven anomaly detection service, covering feature analysis of time‑series patterns, algorithm selection (MAD, boxplot, EVT), model training, real‑time detection with Flink, operational metrics, and future enhancements.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
AI-Powered Database Anomaly Detection Service: Feature Analysis, Algorithm Selection, and Real-Time Monitoring

Background

Meituan's database platform requires high stability and low tolerance for anomalies. Fixed‑threshold alerts depend on expert rules and cannot adapt dynamically, so small issues may become major failures. An AI‑based 24/7 monitoring service was built to detect, locate, and mitigate anomalies early.

Feature Analysis

Periodic Patterns

Historical metrics are decomposed by extracting a trend component with a moving average, then computing a rolling autocorrelation on the residual series. Peaks in the autocorrelation sequence determine the cycle length T. The workflow is illustrated in Figure 2.

Drift Detection

Median smoothing over a sufficiently large window removes periodic influence. The smoothed series is examined for strict monotonicity; if it is strictly increasing or decreasing, a long‑term trend is declared and processing stops. Otherwise two rules are applied: (a) if the maximum of the left window is less than the minimum of the right window, an upward drift is flagged; (b) if the minimum of the left window exceeds the maximum of the right window, a downward drift is flagged. Figure 3 shows examples.

Stationarity

Stationarity is tested with the Augmented Dickey‑Fuller (ADF) test. A series is considered stationary when the ADF p‑value for both the most recent 1‑day and 7‑day windows is below 0.05. Figure 4 presents the test results.

Algorithm Selection

Distribution skewness guides algorithm choice:

Low skew, high symmetry → Median Absolute Deviation (MAD)

Moderate skew → Boxplot

High skew → Extreme Value Theory (EVT)

3‑Sigma was rejected because of its low tolerance to outliers; MAD provides better robustness for symmetric data. Figures 5 and 6 visualize the distribution analysis and the algorithm comparison.

Case Sample Modeling

An end‑to‑end pipeline is demonstrated: the original series, day‑wise folding, a zoomed‑in trend for a specific time index, and the derived lower threshold (Figure 7). For a highly skewed distribution, the EVT‑derived threshold is more reasonable (Figure 8).

Model Training and Real‑Time Detection

Data Flow

Offline training pulls historical data from the MOD data warehouse via the internal KV store Squirrel, reads configuration parameters, trains models, and stores them in Elasticsearch (ES). Online detection runs on Apache Flink, consumes messages from the internal Mafka queue, loads the trained models from ES, evaluates incoming streams in real time, and writes anomaly records back to ES. Figure 9 shows the technical design.

Anomaly Detection Process

The detection algorithm follows a divide‑and‑conquer approach. Offline, historical data are pre‑processed, time series are classified (drift, stationarity, periodicity), and appropriate models are built. Online, the trained models are loaded to evaluate incoming streams in real time. Figure 10 illustrates the process.

Product Operation

Operational metrics based on manual verification of sampled anomalies are: precision = 81 %, recall = 82 %, and F1‑score = 81 %.

Future Outlook

Enable anomaly‑type identification (mean shift, volatility, spikes) for subscription‑based alerts and downstream diagnosis.

Build a Human‑in‑the‑Loop loop to incorporate feedback‑driven model updates.

Extend support to more database scenarios such as end‑to‑end error reporting and node‑level network monitoring.

Appendix

Median Absolute Deviation (MAD)

MAD measures robust deviation: MAD = median(|x_i - median(x)|). With a normal prior, the scaling factor C = 1.4826 and threshold multiplier k = 3 are typical. MAD tolerates outliers better than standard deviation.

Boxplot

Boxplot summarizes a distribution with five statistics: Q0 (minimum), Q1 (lower quartile), Q2 (median), Q3 (upper quartile), Q4 (maximum). Points beyond 1.5 × IQR from the quartiles are considered outliers. Adjusted boxplots handle skewed data.

Extreme Value Theory (EVT)

EVT models the tail of an unknown distribution without assuming a specific form. Using the Generalized Pareto Distribution (GPD) with shape parameter ξ and scale β, thresholds are derived via maximum‑likelihood estimation. The risk parameter q, sample size n, and exceedance count N_t determine the final threshold.

References

Ren et al., “Time‑series anomaly detection service at Microsoft,” KDD 2019.

Lu et al., “Learning under concept drift: A review,” IEEE TKDE 2018.

Mushtaq, “Augmented Dickey‑Fuller test,” 2011.

Ma et al., “Diagnosing root causes of intermittent slow queries in cloud databases,” VLDB 2020.

Holzinger, “Interactive machine learning for health informatics,” Brain Informatics 2016.

Leys et al., “Detecting outliers: Use absolute deviation around the median,” JESP 2013.

Hubert & Vandervieren, “An adjusted boxplot for skewed distributions,” CSDA 2008.

Siffer et al., “Anomaly detection in streams with extreme value theory,” KDD 2017.

Database metric patterns
Database metric patterns
Algorithm comparison
Algorithm comparison
Modeling workflow
Modeling workflow
Anomaly detection process
Anomaly detection process
Boxplot illustration
Boxplot illustration
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

FlinkAI AlgorithmsBoxplotTime Series AnalysisExtreme Value TheoryDatabase Anomaly DetectionMAD
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.