Master Scrapy: Extract Likes, Comments, and Content with XPath
This article continues a Scrapy tutorial by showing how to extract like counts, comment counts, and full article content using XPath selectors, regular expressions, and debugging techniques, providing step‑by‑step code examples and screenshots to help Python developers automate web data collection.
Introduction
In the previous article we introduced how to use Scrapy XPath selectors to collect target data from web pages. This continuation focuses on extracting like counts, comment counts, and article content using XPath, regular expressions, and debugging with Scrapy shell.
Implementation Details
9. By locating the like count element we can quickly find the collection count, which has a slightly different HTML structure but the analysis method remains the same.
10. The debugging code is shown below.
11. To extract only the numeric part we apply a regular expression; the code can be debugged in PyCharm.
12. The comment count is simpler, with a dedicated tag.
13. Note that the comment tag uses an href attribute rather than a class; mismatched selectors will return empty results.
14. Like the collection count, use a regular expression to capture the numeric comment count, reusing the collection code and replacing the variable name.
15. Extracting the main article body varies by page; generally the content resides under an entry tag.
16. In Scrapy shell we can obtain the XPath expression for the article body.
17. Combining the analysis and XPath expressions yields the complete crawling code.
18. Debugging the spider shows the captured data clearly.
19. The console output confirms that the extracted values match the web page.
Conclusion
Overall, use the browser's F12 tool to inspect elements, analyze the page structure, craft XPath expressions, test them in Scrapy shell, embed them into the spider, and run or debug to obtain the final data.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
