Information Security 31 min read

How to Detect and Block Malicious Bot Traffic Using Log Analysis

This guide explains how to identify and differentiate malicious bot traffic from legitimate requests by analyzing web server logs, leveraging fields such as IP, user‑agent, referer, and parameters, and then applying WAF rules, automation, and security platforms to mitigate attacks and improve operational metrics.

Huolala Tech

Sep 19, 2023

How to Detect and Block Malicious Bot Traffic Using Log Analysis

Background

Bot (robot) traffic refers to automated requests to web sites, apps or APIs generated by scripts, crawlers, or simulators. In 2020, bot traffic accounted for 57% of total requests, surpassing human traffic. Bot traffic can be benign (search engines, analytics) or malicious (data scraping, credential stuffing, fraud).

Log Analysis Basics

Web servers (Nginx, Apache, Tomcat) record HTTP request fields such as time, IP, request line, user_agent, referer, method, status, body_bytes_sent, cookie, and request_body. When a WAF is in place, additional fields like full request body and headers are available.

time : request timestamp

IP : client IP (may be spoofed, e.g., X-Forwarded-For)

request : URL with path and parameters

user_agent : OS/browser or tool identifier (can be forged)

referer : source page URL (can be forged)

method : GET, POST, PUT, etc.

status : HTTP status code

body_bytes_sent : size of response body

cookie : session or user identifier (can be forged)

request_body : POST parameters and values

Business‑specific parameters such as userid, token, device_id, trace_id, timestamp, signature, nonce, etc., are also valuable for security analysis.

Detection Methods

By aggregating and correlating log fields over a time window, analysts can spot abnormal patterns. Typical detection rules include:

Top‑10 IPs, User‑Agents, Referers, tokens, device_id, userid in the last 24 h.

Referer not empty and not belonging to the own domain.

User‑Agent containing script signatures (python, java, curl, etc.).

Requests missing or mismatching Cookie/User‑Agent/Referer.

Repeated identical trace_id, timestamp, signature or nonce within a short period.

Same token or device_id accessed from many different IPs.

High‑frequency requests to a single endpoint (e.g., order_id, phone, password).

Case Studies – Command‑Line Practice

Example commands to extract and rank IPs:

zcat access.log.gz | awk '{print $1}' | sort | uniq -c | sort -rn | head

Identify script‑based User‑Agents:

zcat access.log.gz | awk -F '"' '$6~/python|java|apache|client|curl/'

Find non‑browser User‑Agents:

zcat access.log.gz | grep -v "Mozilla" | awk -F '"' '{print $6}' | sort | uniq -c | sort -rn | head

Images illustrate high‑frequency IP analysis and suspicious User‑Agent patterns.

Log Analysis Tools

Several open‑source and commercial tools can automate log parsing and security reporting:

360 Xingt (detects SQL injection, XSS, crawler scans, CC attacks).

LogForensics (single‑trace investigation).

GoAccess (real‑time interactive web UI).

AWStats (classic Apache log analyzer).

Logstalgia (visual 3‑D replay).

web‑log‑parser (Python‑based flexible parser).

Webalizer (lightweight HTML reports).

Log Analysis Platforms

For large‑scale environments, centralized platforms are preferred.

ELK/EFK Stack

ElasticSearch stores logs, Logstash/Filebeat collects and enriches them, and Kibana visualizes queries and dashboards. Example: filter by token, aggregate by IP, and generate security reports.

Alibaba Cloud SLS

SLS provides real‑time log ingestion, SQL‑like queries, and alerting. Example query extracts phone_no parameters and counts repeated trace_id values.

querystring:"phone_no" | select host,url_extract_parameter(querystring,'_su') as su, count(url_extract_parameter(querystring,'_su')) as count group by host,su having count > 50 order by count desc

Hive on Hadoop

Log data can be loaded into Hive tables and analyzed with HQL. Example: count tokens that appear with >100 distinct IPs in a day.

Automation Practices

Real‑time detection can be built with Kafka for log streaming, Logstash for enrichment, Flink for stream processing, and SIEM/SOC platforms for UEBA. Automated response may involve SOAR or direct WAF OpenAPI calls to block malicious IPs, User‑Agents, or token values.

Operational Process and Metrics

A structured SOP covers alert triage, root‑cause analysis, remediation, and post‑mortem. Key performance indicators include detection coverage, detection accuracy, interception accuracy, false‑positive rate, MTTD, MTTR, and automation rate.

Detection coverage = (assets monitored / total assets) × 100 %.

Detection accuracy = (1 – false alerts / total alerts) × 100 %.

Interception accuracy = (1 – false blocks / total blocks) × 100 %.

Effective interception = (1 – missed attacks / total attacks) × 100 %.

MTTD = average time from attack occurrence to detection.

MTTR = average time from detection to resolution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Automation web security log analysis bot detection WAF

Written by

Huolala Tech

Technology reshapes logistics

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.