Backend Development 6 min read

Master Scrapy: Extract Likes, Comments, and Content with XPath

This article continues a Scrapy tutorial by showing how to extract like counts, comment counts, and full article content using XPath selectors, regular expressions, and debugging techniques, providing step‑by‑step code examples and screenshots to help Python developers automate web data collection.

Python Crawling & Data Mining

Oct 24, 2020

Master Scrapy: Extract Likes, Comments, and Content with XPath

Introduction

In the previous article we introduced how to use Scrapy XPath selectors to collect target data from web pages. This continuation focuses on extracting like counts, comment counts, and article content using XPath, regular expressions, and debugging with Scrapy shell.

Implementation Details

9. By locating the like count element we can quickly find the collection count, which has a slightly different HTML structure but the analysis method remains the same.

10. The debugging code is shown below.

11. To extract only the numeric part we apply a regular expression; the code can be debugged in PyCharm.

12. The comment count is simpler, with a dedicated tag.

13. Note that the comment tag uses an href attribute rather than a class; mismatched selectors will return empty results.

14. Like the collection count, use a regular expression to capture the numeric comment count, reusing the collection code and replacing the variable name.

15. Extracting the main article body varies by page; generally the content resides under an entry tag.

16. In Scrapy shell we can obtain the XPath expression for the article body.

17. Combining the analysis and XPath expressions yields the complete crawling code.

18. Debugging the spider shows the captured data clearly.

19. The console output confirms that the extracted values match the web page.

Conclusion

Overall, use the browser's F12 tool to inspect elements, analyze the page structure, craft XPath expressions, test them in Scrapy shell, embed them into the spider, and run or debug to obtain the final data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python data extraction Web Scraping Scrapy XPath

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.