How to Detect and Block Malicious Bot Traffic Using Log Analysis
This guide explains how to identify and differentiate malicious bot traffic from legitimate requests by analyzing web server logs, leveraging fields such as IP, user‑agent, referer, and parameters, and then applying WAF rules, automation, and security platforms to mitigate attacks and improve operational metrics.
Background
Bot (robot) traffic refers to automated requests to web sites, apps or APIs generated by scripts, crawlers, or simulators. In 2020, bot traffic accounted for 57% of total requests, surpassing human traffic. Bot traffic can be benign (search engines, analytics) or malicious (data scraping, credential stuffing, fraud).
Log Analysis Basics
Web servers (Nginx, Apache, Tomcat) record HTTP request fields such as time, IP, request line, user_agent, referer, method, status, body_bytes_sent, cookie, and request_body. When a WAF is in place, additional fields like full request body and headers are available.
time : request timestamp
IP : client IP (may be spoofed, e.g., X-Forwarded-For)
request : URL with path and parameters
user_agent : OS/browser or tool identifier (can be forged)
referer : source page URL (can be forged)
method : GET, POST, PUT, etc.
status : HTTP status code
body_bytes_sent : size of response body
cookie : session or user identifier (can be forged)
request_body : POST parameters and values
Business‑specific parameters such as userid, token, device_id, trace_id, timestamp, signature, nonce, etc., are also valuable for security analysis.
Detection Methods
By aggregating and correlating log fields over a time window, analysts can spot abnormal patterns. Typical detection rules include:
Top‑10 IPs, User‑Agents, Referers, tokens, device_id, userid in the last 24 h.
Referer not empty and not belonging to the own domain.
User‑Agent containing script signatures (python, java, curl, etc.).
Requests missing or mismatching Cookie/User‑Agent/Referer.
Repeated identical trace_id, timestamp, signature or nonce within a short period.
Same token or device_id accessed from many different IPs.
High‑frequency requests to a single endpoint (e.g., order_id, phone, password).
Case Studies – Command‑Line Practice
Example commands to extract and rank IPs:
zcat access.log.gz | awk '{print $1}' | sort | uniq -c | sort -rn | headIdentify script‑based User‑Agents:
zcat access.log.gz | awk -F '"' '$6~/python|java|apache|client|curl/'Find non‑browser User‑Agents:
zcat access.log.gz | grep -v "Mozilla" | awk -F '"' '{print $6}' | sort | uniq -c | sort -rn | headImages illustrate high‑frequency IP analysis and suspicious User‑Agent patterns.
Log Analysis Tools
Several open‑source and commercial tools can automate log parsing and security reporting:
360 Xingt (detects SQL injection, XSS, crawler scans, CC attacks).
LogForensics (single‑trace investigation).
GoAccess (real‑time interactive web UI).
AWStats (classic Apache log analyzer).
Logstalgia (visual 3‑D replay).
web‑log‑parser (Python‑based flexible parser).
Webalizer (lightweight HTML reports).
Log Analysis Platforms
For large‑scale environments, centralized platforms are preferred.
ELK/EFK Stack
ElasticSearch stores logs, Logstash/Filebeat collects and enriches them, and Kibana visualizes queries and dashboards. Example: filter by token, aggregate by IP, and generate security reports.
Alibaba Cloud SLS
SLS provides real‑time log ingestion, SQL‑like queries, and alerting. Example query extracts phone_no parameters and counts repeated trace_id values.
querystring:"phone_no" | select host,url_extract_parameter(querystring,'_su') as su, count(url_extract_parameter(querystring,'_su')) as count group by host,su having count > 50 order by count descHive on Hadoop
Log data can be loaded into Hive tables and analyzed with HQL. Example: count tokens that appear with >100 distinct IPs in a day.
Automation Practices
Real‑time detection can be built with Kafka for log streaming, Logstash for enrichment, Flink for stream processing, and SIEM/SOC platforms for UEBA. Automated response may involve SOAR or direct WAF OpenAPI calls to block malicious IPs, User‑Agents, or token values.
Operational Process and Metrics
A structured SOP covers alert triage, root‑cause analysis, remediation, and post‑mortem. Key performance indicators include detection coverage, detection accuracy, interception accuracy, false‑positive rate, MTTD, MTTR, and automation rate.
Detection coverage = (assets monitored / total assets) × 100 %.
Detection accuracy = (1 – false alerts / total alerts) × 100 %.
Interception accuracy = (1 – false blocks / total blocks) × 100 %.
Effective interception = (1 – missed attacks / total attacks) × 100 %.
MTTD = average time from attack occurrence to detection.
MTTR = average time from detection to resolution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
