Blocking AhrefsBot and Scrapy Crawlers with Nginx in Docker
This article describes how to identify abnormal Nginx requests caused by bots like AhrefsBot and Scrapy, and provides a step‑by‑step Docker‑based solution that includes a custom Nginx configuration, agent‑deny rules, and optional Alibaba Cloud security‑group IP blocking to prevent memory spikes.
Preface: The author noticed a sudden memory surge on the server and, after checking the Nginx logs, discovered unusual requests from bots.
Investigation revealed that the requests carried identical User-Agent strings such as Scrapy and AhrefsBot, prompting a need to block them.
Solution – Nginx level blocking (Docker environment):
1. docker-compose.yml
version: '3'
services:
d_nginx:
container_name: c_nginx
env_file:
- ./env_files/nginx-web.env
image: nginx:1.20.1-alpine
ports:
- '80:80'
- '81:81'
- '443:443'
links:
- d_php
volumes:
- ./nginx/conf:/etc/nginx/conf.d
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
- ./nginx/deny-agent.conf:/etc/nginx/agent-deny.conf
- ./nginx/certs:/etc/nginx/certs
- ./nginx/logs:/var/log/nginx/
- ./www:/var/www/html2. Directory structure
nginx
├─ nginx.conf
├─ agent-deny.conf
├─ conf
│ ├─ xxxx01_server.conf
│ └─ xxxx02_server.conf3. agent-deny.conf (bot‑blocking rules)
if ($http_user_agent ~* (Scrapy|AhrefsBot)) {
return 404;
}
if ($http_user_agent ~ "Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)|^$") {
return 403;
}4. Include the agent-deny.conf in each server block:
server {
include /etc/nginx/agent-deny.conf;
listen 80;
server_name localhost;
client_max_body_size 100M;
root /var/www/html/xxxxx/public;
index index.php;
# ... other proxy headers ...
location / {
try_files $uri $uri/ /index.php?$query_string;
}
# additional location blocks omitted for brevity
}With these rules, any request matching the specified user‑agents is denied, effectively stopping the AhrefsBot and similar crawlers.
Additional protection – Alibaba Cloud security group
The author also added IP‑range blocks in the cloud security group for extra assurance. Example IP ranges:
54.36.0.0
51.222.0.0
195.154.0.0These measures together mitigate the memory spikes caused by malicious bot traffic.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
php Courses
php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
