Information Security 32 min read

Detecting and Blocking Malicious Bot Traffic with Web Log Analysis

This article explains how to identify and mitigate malicious bot traffic by analyzing web server logs, using command‑line queries, specialized log‑analysis tools, centralized platforms, and automated security workflows, while also outlining operational metrics and response processes for effective protection.

Huolala Safety Emergency Response Center

May 11, 2022

Detecting and Blocking Malicious Bot Traffic with Web Log Analysis

Part 1 – Background

Bot traffic refers to automated requests generated by scripts, crawlers, or simulators that can constitute more than half of total web traffic. Good bots include search engine crawlers, while malicious bots perform data scraping, credential stuffing, and other attacks that cause data leakage, resource consumption, and financial loss.

The article presents a log‑analysis approach to detect abnormal web access, distinguish malicious bots, and apply WAF or anti‑scraping measures, followed by security‑operation metrics and processes.

Part 2 – Log Analysis Basics

Common HTTP log fields from Nginx, Apache, Tomcat, or WAF include time , IP , request , user_agent , referer , method , status , body_bytes_sent , cookie , and request_body . Additional business‑specific parameters such as userid , username , phone , password , token , order_id , device_id , trace_id , timestamp , signature , and nonce are also valuable for security analysis.

Detection methods include:

Top‑N IP/User‑Agent/Referer statistics within a day.

Filtering User‑Agents that contain script signatures (python, java, curl, etc.).

Identifying repeated parameter values (trace_id, token) across many requests.

Cross‑checking IPs, tokens, or device IDs against the number of distinct users.

Typical attack scenarios in the internet industry include account brute‑force, data scraping, order fraud, promotional abuse, and content abuse.

Part 3 – Log Analysis Commands

Example command to list the top 10 most frequent IPs:

zcat access.log.gz | awk '{print $1}' | sort | uniq -c | sort -rn | head

Identify requests with suspicious User‑Agents:

zcat access.log.gz | awk -F '"' '$6~/python|java|apache|client|curl/'

Count high‑frequency User‑Agents (excluding browsers):

zcat access.log.gz | grep -v "Mozilla" | awk -F '"' '{print $6}' | sort | uniq -c | sort -rn | head

Images illustrate the command output and the identified malicious IP or User‑Agent patterns.

Part 4 – Log Analysis Tools

Graphical tools such as 360 星图, LogForensics, GoAccess, AWStats, Logstalgia, web‑log‑parser, and Webalizer can automatically parse logs and highlight SQL injection, XSS, remote code execution, and crawler activity. Although some tools are no longer maintained, they are useful for quick triage.

Part 5 – Log Analysis Platforms

For large‑scale environments, centralized platforms like ELK/EFK (Elasticsearch, Logstash/Filebeat, Kibana), Alibaba Cloud SLS, Hive on Hadoop, and Splunk provide real‑time indexing, querying, and visualization.

Typical ELK workflow: Filebeat collects logs → Logstash parses and enriches fields → Elasticsearch stores data → Kibana visualizes and queries with KQL.

Example SLS query to find phone numbers appearing more than 50 times within five minutes:

querystring:"phone_no" | select host,url_extract_parameter(querystring,'_su') as su, count(url_extract_parameter(querystring,'_su')) as count group by host,su having count > 50 order by count desc

Hive example to count tokens accessed by more than 100 distinct IPs in a day:

SELECT token, COUNT(DISTINCT ip) AS ip_cnt FROM logs WHERE token IS NOT NULL GROUP BY token HAVING ip_cnt > 100 ORDER BY ip_cnt DESC LIMIT 10;

Splunk SPL example to extract and rank tokens:

index="weblog" RequestHost="www.test.com" | rex field=RequestURI "token=(?<token>\w*)&" | fields token | where token!="" | top limit=10 token

Part 6 – Automation Practices

Real‑time detection can be built with Kafka → Logstash → Flink for stream processing, generating UEBA alerts based on token, userid, or IP frequency within sliding windows. Alerts are sent to messaging channels and stored in Elasticsearch.

Sample detection rule: if a single IP makes >60% of login requests within X minutes and exceeds 10 attempts, trigger a password‑brute‑force alert.

Part 7 – Operations and Metrics

Operational challenges include missed detections, alert fatigue, mis‑configurations, slow manual response, and repeated attacks. To address these, define SOPs, conduct post‑mortems, enforce change‑control for WAF rules, and track metrics such as detection coverage, detection accuracy, interception accuracy, false‑positive rate, MTTR, MTTD, and automation rate.

Typical metric formulas are provided (e.g., detection coverage = assets monitored / total assets × 100%).

Part 8 – Conclusion

Advanced bot attacks may require threat‑intelligence feeds, front‑end telemetry, TLS/SSL fingerprinting, machine‑learning models, and deep‑learning classifiers. Combining endpoint hardening, SSL pinning, request signing, CAPTCHAs, and risk‑based controls creates a defense‑in‑depth strategy that reduces attack success and protects business value.

bot detection log monitoring WAF web log analysis

Written by

Huolala Safety Emergency Response Center

Official public account of the Huolala Safety Emergency Response Center (LLSRC)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.