Deploying Python Scrapy Crawlers with Scrapyd and Gerapy: A Step‑by‑Step Guide
This tutorial walks you through installing dependencies, running a Scrapy spider, configuring Scrapyd, packaging the project, and using Gerapy’s visual interface to manage and deploy a Python web crawler for Qiushibaike jokes.
Preface
Hi everyone, I'm a Python enthusiast; let's dive straight into the tutorial.
Dependencies
File:
requirements.txt appdirs==1.4.4
APScheduler==3.5.1
attrs==20.1.0
Automat==20.2.0
beautifulsoup4==4.9.1
certifi==2020.6.20
cffi==1.14.2
chardet==3.0.4
constantly==15.1.0
cryptography==3.0
cssselect==1.1.0
Django==1.11.29
django-apscheduler==0.3.0
django-cors-headers==3.2.0
djangorestframework==3.9.2
furl==2.1.0
gerapy==0.9.5
gevent==20.6.2
greenlet==0.4.16
hyperlink==20.0.1
idna==2.10
incremental==17.5.0
itemadapter==0.1.0
itemloaders==1.0.2
Jinja2==2.10.1
jmespath==0.10.0
lxml==4.5.2
MarkupSafe==1.1.1
orderedmultidict==1.0.1
parsel==1.6.0
Protego==0.1.16
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
PyDispatcher==2.0.5
pyee==7.0.2
PyHamcrest==2.0.2
pymongo==3.11.0
PyMySQL==0.10.0
pyOpenSSL==19.1.0
pyppeteer==0.2.2
pyquery==1.4.1
python-scrapyd-api==2.1.2
pytz==2020.1
pywin32==228
queuelib==1.5.0
redis==3.5.3
requests==2.24.0
Scrapy==1.8.0
scrapy-redis==0.6.8
scrapy-splash==0.7.2
scrapyd==1.2.1
scrapyd-client==1.1.0
service-identity==18.1.0
six==1.15.0
soupsieve==2.0.1
tqdm==4.48.2
Twisted==20.3.0
tzlocal==2.1
urllib3==1.25.10
w3lib==1.22.0
websocket==0.2.1
websockets==8.1
wincertstore==0.2
zope.event==4.4
zope.interface==5.1.0Project Files
Project archive: qiushi.zip Function: Qiushibaike jokes crawler built with Scrapy.
Running the Project
Install dependencies and unzip the project: pip install -r requirements.txt Execute the spider:
scrapy crawl duanzi --nologConfigure Scrapyd
Scrapyd manages Scrapy projects; after configuration you can start, pause, and control spiders via commands.
Start Scrapyd Service
Switch to the qiushi project directory.
Run scrapyd.
Open a browser at http://127.0.0.1:6800/ – the screenshot below shows a successful UI.
Package Scrapy and Deploy to Scrapyd
Modify scrapy.cfg as illustrated (see image).
Package command: scrapyd-deploy qb -p qiushi Successful deployment is indicated by the following screenshot.
Configure Gerapy
Gerapy provides a visual interface for managing Scrapyd.
Setup Steps
Initialize Gerapy: gerapy init (creates a gerapy folder).
Enter the generated gerapy directory.
Migrate the database: gerapy migrate.
Start the server (default 127.0.0.1:8000): gerapy runserver.
Create a superuser: gerapy createsuperuser.
Add Crawler Project in Gerapy
After configuring the host (Scrapyd address, default 127.0.0.1:6800) and creating the project, you can run the spider from the Gerapy UI. Results are saved locally.
Package Spider for Deployment
Copy the Scrapy project into Gerapy’s projects folder.
Refresh the page, click “Deploy”, fill in a description, and package.
Fix "scrapyd-deploy is not recognized" Error
Create scrapy.bat and scrapyd-deploy.bat in the Python Scripts directory with the following content (replace the path with your interpreter location):
@echo off
D:\programFiles\miniconda3\envs\hy_spider\python D:\programFiles\miniconda3\envs\hy_spider\Scripts\scrapy %* @echo off
D:\programFiles\miniconda3\envs\hy_spider\python D:\programFiles\miniconda3\envs\hy_spider\Scripts\scrapyd-deploy %*Summary
This guide demonstrates how to combine Gerapy, Scrapyd, and Scrapy to visually deploy and manage a Python web crawler.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
