Master CSS Selectors in Scrapy: Extract Likes, Comments, and Content Efficiently
This guide walks you through extracting likes, comments, and article content from web pages using Scrapy’s CSS selectors, showing how to locate elements like bookmark buttons, parse numeric data with regular expressions, and integrate the resulting code into your Python spider for reliable data collection.
This article continues a series on using Scrapy for web data extraction, building on previous tutorials about Xpath and CSS selectors.
Reference articles: Using Xpath selectors in Scrapy – Part 1 , Using Xpath selectors in Scrapy – Part 2 , and Using CSS selectors in Scrapy – Part 1 .
Practical Application
9. By counting likes you can quickly locate the number of favorites. The bookmark-btn element is globally unique, allowing you to extract the favorite count, which appears as a string that must be further processed with a regular expression.
10. Based on the page structure, you can write a CSS selector to target the desired element.
11. To extract only the numeric part, apply a regular expression. The following simple code, debugged in PyCharm, demonstrates this:
# Example Python code to extract numbers using regex
import re
text = "15 收藏"
match = re.search(r"\d+", text)
collection_num = match.group() if match else None12. Locate the href attribute of the a tag, then find the nested span to extract the comment count.
13. Debug the selector in scrapyshell to verify it captures the correct element.
14. As with the favorite count, use a regular expression to extract the numeric comment count, reusing the same code and replacing collection_num with comment_num.
15. The main article body resides under the entry tag, making it easy to extract.
16. Debugging in scrapyshell shows that CSS selectors are often more concise than Xpath.
17. Combining the analysis and CSS selectors yields the complete extraction code (illustrated below).
18. Run the spider in PyCharm and debug to verify the extracted data.
19. The console output confirms that the variables match the webpage content.
Summary
Overall, using CSS selectors follows the same workflow as Xpath: inspect elements with F12, analyze the page structure, craft a CSS expression, test it in scrapyshell, and embed the final selector into your Scrapy spider for execution or debugging. While the syntax differs, developers familiar with front‑end styling may find CSS selectors more intuitive, though the choice ultimately depends on personal preference.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
