Time Series Forecasting and Anomaly Detection for API Traffic Using Seasonal Decomposition and ARIMA
The article presents a complete workflow for predicting next‑day API request volumes by exploring per‑minute traffic data, handling missing values, applying seasonal decomposition, training an ARIMA model on the trend component, and generating confidence intervals to flag anomalous spikes.
Company platforms expose many APIs (account query, release, red‑packet, etc.) that log per‑minute access counts, resulting in 1440 records per day; the goal is to forecast the next day's traffic using historical data and trigger alerts when actual traffic deviates significantly from the prediction.
Data exploration uses a seven‑day sample (10080 minutes) stored in data with columns date (minute timestamp) and count (access count). Initial plots reveal abrupt drops to zero caused by ETL‑generated placeholder values.
Missing values are filled by averaging the surrounding points:
data = pd.read_csv(filename) print('size: ', data.shape) print(data.head())Key characteristics identified for modeling:
Strong daily seasonality with higher activity in afternoons/evenings.
Frequent spikes and drops, requiring smoothing before modeling.
Different APIs may exhibit vastly different patterns, so the model must be adaptable.
Preprocessing
1. Split the first six days as training data and the seventh day as test data.
class ModelDecomp(object): def __init__(self, file, test_size=1440): self.ts = self.read_data(file) self.test_size = test_size self.train_size = len(self.ts) - self.test_size self.train = self.ts[:len(self.ts)-test_size] self.test = self.ts[-self.test_size:]2. Smooth the training series by differencing, detecting outliers beyond 1.5×IQR, and replacing them with the mean of surrounding values:
def _diff_smooth(self, ts): dif = ts.diff().dropna() # difference series td = dif.describe() high = td['75%'] + 1.5 * (td['75%'] - td['25%']) low = td['25%'] - 1.5 * (td['75%'] - td['25%']) forbid_index = dif[(dif > high) | (dif < low)].index i = 0 while i < len(forbid_index) - 1: n = 1 start = forbid_index[i] while forbid_index[i+n] == start + timedelta(minutes=n): n += 1 i += n - 1 end = forbid_index[i] value = np.linspace(ts[start - timedelta(minutes=1)], ts[end + timedelta(minutes=1)], n) ts[start:end] = value i += 1 self.train = self._diff_smooth(self.train) draw_ts(self.train)3. Decompose the (smoothed) series into trend, seasonal, and residual components using statsmodels :
from statsmodels.tsa.seasonal import seasonal_decompose decomposition = seasonal_decompose(self.ts, freq=1440, two_sided=False) self.trend = decomposition.trend self.seasonal = decomposition.seasonal self.residual = decomposition.resid decomposition.plot()The additive model assumes observed = trend + seasonal + residual . Only the trend component is modeled further.
Modeling
Train an ARIMA model on the trend part:
def trend_model(self, order): self.trend.dropna(inplace=True) train = self.trend[:len(self.trend)-self.test_size] self.trend_model = ARIMA(train, order).fit(disp=-1, method='css')Predict the next day's trend, then add back the seasonal pattern and confidence bounds derived from the residual distribution:
d = self.residual.describe() delta = d['75%'] - d['25%'] self.low_error, self.high_error = (d['25%'] - 1 * delta, d['75%'] + 1 * delta) def predict_new(self): n = self.test_size self.pred_time_index = pd.date_range(start=self.train.index[-1], periods=n+1, freq='1min')[1:] self.trend_pred = self.trend_model.forecast(n)[0] self.add_season() def add_season(self): self.train_season = self.seasonal[:self.train_size] values, low_conf_values, high_conf_values = [], [], [] for i, t in enumerate(self.pred_time_index): trend_part = self.trend_pred[i] season_part = self.train_season[self.train_season.index.time == t.time()].mean() predict = trend_part + season_part low_bound = predict + self.low_error high_bound = predict + self.high_error values.append(predict) low_conf_values.append(low_bound) high_conf_values.append(high_bound) self.final_pred = pd.Series(values, index=self.pred_time_index, name='predict') self.low_conf = pd.Series(low_conf_values, index=self.pred_time_index, name='low_conf') self.high_conf = pd.Series(high_conf_values, index=self.pred_time_index, name='high_conf')Evaluation
Apply the pipeline to the sample file, plot the original series, predictions, and confidence intervals, and compute RMSE:
md = ModelDecomp(file=filename, test_size=1440) md.decomp(freq=1440) md.trend_model(order=(1,1,3)) md.predict_new() pred = md.final_pred test = md.test plt.subplot(211) plt.plot(md.ts) plt.subplot(212) pred.plot(color='blue', label='Predict') test.plot(color='red', label='Original') md.low_conf.plot(color='grey', label='low') md.high_conf.plot(color='grey', label='high') plt.legend(loc='best') plt.title('RMSE: %.4f' % np.sqrt(sum((pred.values - test.values) ** 2) / test.size)) plt.show()The resulting RMSE is about 462.8, which is acceptable given the magnitude of the raw counts; two abrupt spikes in the test set exceed the confidence bounds and are correctly flagged as anomalies.
Conclusion
The core idea for any periodic API traffic series is to decompose the signal, model the trend, re‑assemble with seasonal and residual components, and define confidence intervals for anomaly detection; the approach can be adapted to other APIs with different patterns.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.