Backend Development 7 min read

Master CSS Selectors in Scrapy: Extract Likes, Comments, and Content Efficiently

This guide walks you through extracting likes, comments, and article content from web pages using Scrapy’s CSS selectors, showing how to locate elements like bookmark buttons, parse numeric data with regular expressions, and integrate the resulting code into your Python spider for reliable data collection.

Python Crawling & Data Mining

Oct 31, 2020

Master CSS Selectors in Scrapy: Extract Likes, Comments, and Content Efficiently

This article continues a series on using Scrapy for web data extraction, building on previous tutorials about Xpath and CSS selectors.

Reference articles: Using Xpath selectors in Scrapy – Part 1 , Using Xpath selectors in Scrapy – Part 2 , and Using CSS selectors in Scrapy – Part 1 .

Practical Application

9. By counting likes you can quickly locate the number of favorites. The bookmark-btn element is globally unique, allowing you to extract the favorite count, which appears as a string that must be further processed with a regular expression.

10. Based on the page structure, you can write a CSS selector to target the desired element.

11. To extract only the numeric part, apply a regular expression. The following simple code, debugged in PyCharm, demonstrates this:

# Example Python code to extract numbers using regex
import re
text = "15 收藏"
match = re.search(r"\d+", text)
collection_num = match.group() if match else None

12. Locate the href attribute of the a tag, then find the nested span to extract the comment count.

13. Debug the selector in scrapyshell to verify it captures the correct element.

14. As with the favorite count, use a regular expression to extract the numeric comment count, reusing the same code and replacing collection_num with comment_num.

15. The main article body resides under the entry tag, making it easy to extract.

16. Debugging in scrapyshell shows that CSS selectors are often more concise than Xpath.

17. Combining the analysis and CSS selectors yields the complete extraction code (illustrated below).

18. Run the spider in PyCharm and debug to verify the extracted data.

19. The console output confirms that the variables match the webpage content.

Summary

Overall, using CSS selectors follows the same workflow as Xpath: inspect elements with F12, analyze the page structure, craft a CSS expression, test it in scrapyshell, and embed the final selector into your Scrapy spider for execution or debugging. While the syntax differs, developers familiar with front‑end styling may find CSS selectors more intuitive, though the choice ultimately depends on personal preference.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python regular expressions Scrapy CSS selectors

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.