Build a Scrapy Spider for Jobbole.com from Scratch in PyCharm

This step‑by‑step guide shows how to create a Scrapy spider project for the Jobbole website, configure the project structure, import it into PyCharm, set up the correct Python interpreter, and verify the generated spider code, preparing you for data extraction.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Build a Scrapy Spider for Jobbole.com from Scratch in PyCharm

In the previous article we learned how to create a first Scrapy crawler project; this follow‑up dives deeper by demonstrating the creation of a Scrapy spider for the Jobbole online news site.

After generating the Scrapy project, run the following commands in the project root:

cd article
scrapy genspider jobbole blog.jobbole.com

This uses Scrapy’s built‑in basic template to create the spider under article/spiders/jobbole. Verify the new file with: tree /f You will see jobbole.py added to the spiders folder.

The default template is sufficient for most cases, though you can customize it if needed.

Next, import the whole Scrapy project into PyCharm: choose File → Open , locate the project folder, and confirm.

If jobbole.py is not visible in the spiders directory, right‑click the spiders folder and select Synchronize spider to refresh the view.

Opening jobbole.py reveals the auto‑generated skeleton, which includes the spider name, allowed_domains, and start_urls fields.

Finally, ensure the correct Python interpreter is selected in PyCharm: open Settings → Project → Python Interpreter , search for the virtual environment created for the Scrapy project, and add it if it is not already selected.

After adding the local interpreter, the Scrapy environment, project files, and interpreter configuration are ready, and you can proceed to implement crawling logic and data extraction.

For more examples, visit the author’s GitHub: https://github.com/cassieeric .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonScrapyPyCharmSpider
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.