Artificial Intelligence 10 min read

Machine Learning‑Driven Time Series Forecasting and Anomaly Detection System at JD Search

The article describes JD Search’s machine‑learning‑based time‑series forecasting and anomaly‑detection platform, detailing its overall architecture, offline and real‑time training pipelines, FFT‑based periodicity detection, Prophet forecasting, DBSCAN outlier detection, and distributed optimizations such as Alink integration and load‑balancing strategies.

DataFunTalk
DataFunTalk
DataFunTalk
Machine Learning‑Driven Time Series Forecasting and Anomaly Detection System at JD Search

The JD Search data science team developed a machine‑learning‑driven time‑series forecasting and anomaly‑detection system to improve alarm accuracy, reduce false‑positive rates, and provide causal explanations for incidents.

The overall architecture consists of offline training tasks that load data from HDFS, perform feature engineering, train models, and store model information and parameters in a parameter server, as well as real‑time training tasks that consume samples from Kafka, accumulate them into mini‑batches, pull parameters, optionally predict, and push updated models back to the parameter server.

FFT (Fast Fourier Transform) is employed as a stateless algorithm to detect periodicity in the data; a pre‑trained "pattern" model is used to replace abnormal segments, while moving‑average filtering and linear‑regression trend removal mitigate the impact of spikes and long‑term trends.

The article explains why FFT is preferred over differencing for periodicity detection, noting its ability to capture multiple co‑existing cycles in a single computation and its sensitivity to anomalies, which is addressed by the pattern‑based replacement strategy.

Prophet, an open‑source time‑series model from Facebook, is used for forecasting. Training data must be at least twice the identified period, missing values are handled, hyper‑parameters such as changepoint_prior_scale and seasonality_prior_scale are tuned offline, and model performance is evaluated primarily with MAPE, aiming for a score where 1‑MAPE ≥ 0.9. Predictions are made one step at a time with immediate model updates.

DBSCAN, a density‑based clustering algorithm, is applied to a feature space composed of the original metric and the absolute Prophet residuals. The silhouette coefficient assesses clustering quality, and points labeled -1 are identified as anomalies.

To scale the solution, the team integrated Alink for distributed Python method calls, enabling parallel processing of large data volumes. They switched the data‑distribution strategy from Flink’s keyBy to rebalance , eliminating machine‑level load imbalance and keeping end‑to‑end latency under five minutes.

Beyond detection, the system aims to attribute anomalies to root causes across multiple data streams (clicks, add‑to‑cart, exposures) using causal inference models, thereby shortening troubleshooting time.

The presentation concludes with a thank‑you to the audience.

machine learningAnomaly DetectionDistributed Trainingtime seriesDBSCANProphetFFT
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.