How to Detect and Diagnose Traffic Surges with Access Log AIOps
This article explains how to use Access Log data combined with AIOps algorithms and SPL to quickly identify traffic spikes, pinpoint their origins, and apply targeted scaling and optimization for cloud‑native services experiencing sudden user growth.
Business Background
Recent migration of many U.S. users to new social platforms caused a massive surge in traffic for overseas services. Companies face three key questions: how to recognize sudden traffic spikes, how to identify traffic sources for network optimization, and how to locate the cause of service fluctuations.
What Is an Access Log?
An Access Log records detailed information about each request to a server, such as IP address, timestamp, URL, HTTP status, and user‑agent. It is essential for analyzing user behavior, optimizing performance, and enhancing security. Below is an example of typical Nginx Access Log fields.
host:www.lsb.mock.com</code><code>remote_addr:123.246.223.87</code><code>http_user_agent:Mozilla/5.0 (Macintosh; AMD Mac OS X 10_8_2) AppleWebKit/535.22 (KHTML, like Gecko) Chrome/18.6.872</code><code>request_method:POST</code><code>request_time:53</code><code>request_uri:/request/path-0/file-2</code><code>status:200</code><code>time_local:15/Jan/2025:01:36:56</code><code>upstream_response_time:2.05Why Use AIOps for Log Analysis?
Traditional monitoring struggles with accurate anomaly detection and rapid root‑cause identification. AIOps functions in SPL provide smarter observability, enabling precise anomaly scoring and automated root‑cause drilling.
Key Algorithms
Anomaly Detection : Models historical data to flag abnormal intervals and assign anomaly scores.
Root‑Cause Localization : Computes sub‑sequences across dimensions to reveal which combination caused the anomaly.
Step‑by‑Step Practice
1. Log to Metrics – Convert raw log text into quantitative metrics, e.g., per‑minute request count.
| extend ts=second_to_nano(to_unixtime(date_trunc(60,__time__)))</code><code>| stats request_count=count(1) by ts</code><code>| make-series request_count on ts2. Anomaly Detection – Apply functions like series_decompose_anomalies or series_pattern_anomalies to the request count series.
| extend ret = series_decompose_anomalies(request_count_arr)</code><code>or</code><code>| extend ret = series_pattern_anomalies(request_count_arr)3. Locate Anomaly Time – Filter recent points with high anomaly scores and configure alerts.
| extend anomalies_score_series = ret.anomalies_score_series</code><code>| where array_max(slice(anomalies_score_series,-5,5)) >= 0.54. Enrich Dimensions with IP Functions – Transform IP addresses into country, province, city, etc.
| extend country = ip_to_country(remote_addr)</code><code>| stats access_count=count(1) by country5. Root‑Cause Analysis – Group metrics by country, host, and agent, then drill down to the offending dimension.
| extend ts=second_to_nano(to_unixtime(date_trunc(60,__time__)))</code><code>| extend country = ip_to_country(remote_addr)</code><code>| stats request_count=count(1) by ts,country,host,agent</code><code>| make-series request_count on ts by country,host,agent</code><code>| stats country_arr=array_agg(country), host_arr=array_agg(host), agent_arr=array_agg(agent), ts_arr=array_agg(__ts__), metrics_arr=array_agg(request_count)</code><code>| extend ret = series_drilldown(country_arr,host_arr,agent_arr,ts_arr,metrics_arr,1736756946000000000)The result shows the anomaly is linked to traffic from the United States, confirming the hypothesis.
6. Targeted Scaling – Use the identified dimension to guide scaling decisions and handle the traffic surge.
Conclusion
By following this SPL + AIOps workflow, engineers can achieve comprehensive observability of access logs, quickly detect traffic anomalies, pinpoint their origins, and implement precise scaling to accommodate rapid user growth.
Future releases will add more AIOps functions to further empower intelligent monitoring.
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
