Deploying Python Scrapy Crawlers with Scrapyd and Gerapy: A Step‑by‑Step Guide

This tutorial walks you through installing dependencies, running a Scrapy spider, configuring Scrapyd, packaging the project, and using Gerapy’s visual interface to manage and deploy a Python web crawler for Qiushibaike jokes.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Deploying Python Scrapy Crawlers with Scrapyd and Gerapy: A Step‑by‑Step Guide

Preface

Hi everyone, I'm a Python enthusiast; let's dive straight into the tutorial.

Dependencies

File:

requirements.txt
appdirs==1.4.4
APScheduler==3.5.1
attrs==20.1.0
Automat==20.2.0
beautifulsoup4==4.9.1
certifi==2020.6.20
cffi==1.14.2
chardet==3.0.4
constantly==15.1.0
cryptography==3.0
cssselect==1.1.0
Django==1.11.29
django-apscheduler==0.3.0
django-cors-headers==3.2.0
djangorestframework==3.9.2
furl==2.1.0
gerapy==0.9.5
gevent==20.6.2
greenlet==0.4.16
hyperlink==20.0.1
idna==2.10
incremental==17.5.0
itemadapter==0.1.0
itemloaders==1.0.2
Jinja2==2.10.1
jmespath==0.10.0
lxml==4.5.2
MarkupSafe==1.1.1
orderedmultidict==1.0.1
parsel==1.6.0
Protego==0.1.16
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
PyDispatcher==2.0.5
pyee==7.0.2
PyHamcrest==2.0.2
pymongo==3.11.0
PyMySQL==0.10.0
pyOpenSSL==19.1.0
pyppeteer==0.2.2
pyquery==1.4.1
python-scrapyd-api==2.1.2
pytz==2020.1
pywin32==228
queuelib==1.5.0
redis==3.5.3
requests==2.24.0
Scrapy==1.8.0
scrapy-redis==0.6.8
scrapy-splash==0.7.2
scrapyd==1.2.1
scrapyd-client==1.1.0
service-identity==18.1.0
six==1.15.0
soupsieve==2.0.1
tqdm==4.48.2
Twisted==20.3.0
tzlocal==2.1
urllib3==1.25.10
w3lib==1.22.0
websocket==0.2.1
websockets==8.1
wincertstore==0.2
zope.event==4.4
zope.interface==5.1.0

Project Files

Project archive: qiushi.zip Function: Qiushibaike jokes crawler built with Scrapy.

Running the Project

Install dependencies and unzip the project: pip install -r requirements.txt Execute the spider:

scrapy crawl duanzi --nolog

Configure Scrapyd

Scrapyd manages Scrapy projects; after configuration you can start, pause, and control spiders via commands.

Start Scrapyd Service

Switch to the qiushi project directory.

Run scrapyd.

Open a browser at http://127.0.0.1:6800/ – the screenshot below shows a successful UI.

Package Scrapy and Deploy to Scrapyd

Modify scrapy.cfg as illustrated (see image).

Package command: scrapyd-deploy qb -p qiushi Successful deployment is indicated by the following screenshot.

Configure Gerapy

Gerapy provides a visual interface for managing Scrapyd.

Setup Steps

Initialize Gerapy: gerapy init (creates a gerapy folder).

Enter the generated gerapy directory.

Migrate the database: gerapy migrate.

Start the server (default 127.0.0.1:8000): gerapy runserver.

Create a superuser: gerapy createsuperuser.

Add Crawler Project in Gerapy

After configuring the host (Scrapyd address, default 127.0.0.1:6800) and creating the project, you can run the spider from the Gerapy UI. Results are saved locally.

Package Spider for Deployment

Copy the Scrapy project into Gerapy’s projects folder.

Refresh the page, click “Deploy”, fill in a description, and package.

Fix "scrapyd-deploy is not recognized" Error

Create scrapy.bat and scrapyd-deploy.bat in the Python Scripts directory with the following content (replace the path with your interpreter location):

@echo off
D:\programFiles\miniconda3\envs\hy_spider\python D:\programFiles\miniconda3\envs\hy_spider\Scripts\scrapy %*
@echo off
D:\programFiles\miniconda3\envs\hy_spider\python D:\programFiles\miniconda3\envs\hy_spider\Scripts\scrapyd-deploy %*

Summary

This guide demonstrates how to combine Gerapy, Scrapyd, and Scrapy to visually deploy and manage a Python web crawler.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonDeploymentWeb CrawlingScrapydGerapy
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.