Blocking AhrefsBot and Scrapy Crawlers with Nginx in Docker

This article describes how to identify abnormal Nginx requests caused by bots like AhrefsBot and Scrapy, and provides a step‑by‑step Docker‑based solution that includes a custom Nginx configuration, agent‑deny rules, and optional Alibaba Cloud security‑group IP blocking to prevent memory spikes.

php Courses
php Courses
php Courses
Blocking AhrefsBot and Scrapy Crawlers with Nginx in Docker

Preface: The author noticed a sudden memory surge on the server and, after checking the Nginx logs, discovered unusual requests from bots.

Investigation revealed that the requests carried identical User-Agent strings such as Scrapy and AhrefsBot, prompting a need to block them.

Solution – Nginx level blocking (Docker environment):

1. docker-compose.yml

version: '3'
services:
  d_nginx:
    container_name: c_nginx
    env_file:
      - ./env_files/nginx-web.env
    image: nginx:1.20.1-alpine
    ports:
      - '80:80'
      - '81:81'
      - '443:443'
    links:
      - d_php
    volumes:
      - ./nginx/conf:/etc/nginx/conf.d
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
      - ./nginx/deny-agent.conf:/etc/nginx/agent-deny.conf
      - ./nginx/certs:/etc/nginx/certs
      - ./nginx/logs:/var/log/nginx/
      - ./www:/var/www/html

2. Directory structure

nginx
├─ nginx.conf
├─ agent-deny.conf
├─ conf
│   ├─ xxxx01_server.conf
│   └─ xxxx02_server.conf

3. agent-deny.conf (bot‑blocking rules)

if ($http_user_agent ~* (Scrapy|AhrefsBot)) {
    return 404;
}
if ($http_user_agent ~ "Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)|^$") {
    return 403;
}

4. Include the agent-deny.conf in each server block:

server {
    include /etc/nginx/agent-deny.conf;
    listen 80;
    server_name localhost;
    client_max_body_size 100M;
    root /var/www/html/xxxxx/public;
    index index.php;
    # ... other proxy headers ...
    location / {
        try_files $uri $uri/ /index.php?$query_string;
    }
    # additional location blocks omitted for brevity
}

With these rules, any request matching the specified user‑agents is denied, effectively stopping the AhrefsBot and similar crawlers.

Additional protection – Alibaba Cloud security group

The author also added IP‑range blocks in the cloud security group for extra assurance. Example IP ranges:

54.36.0.0
51.222.0.0
195.154.0.0

These measures together mitigate the memory spikes caused by malicious bot traffic.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendDockerNGINXbot blocking
php Courses
Written by

php Courses

php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.