Backend Development 10 min read

Master Python Web Scraping: From Requests to Selenium and Scrapy

Learn how to efficiently scrape web pages using Python by exploring multiple approaches—including simple requests with BeautifulSoup, fast parsing with lxml, dynamic content extraction with Selenium, and large‑scale crawling with Scrapy—complete with installation steps, code snippets, and detailed explanations.

Python Programming Learning Circle

Jun 7, 2025

Master Python Web Scraping: From Requests to Selenium and Scrapy

Web Scraping with Python

In data science and web crawling, web scraping is essential. Python is popular due to powerful third‑party libraries that simplify HTML parsing and data extraction.

1. Using requests and BeautifulSoup

1.1 Install dependencies

pip install requests beautifulsoup4

1.2 Basic usage

import requests
from bs4 import BeautifulSoup

# Send HTTP GET request
url = "https://www.example.com"
response = requests.get(url)

# Parse HTML content
soup = BeautifulSoup(response.text, 'html.parser')

# Extract title
title = soup.title.text
print("网页标题:", title)

# Extract all links
links = soup.find_all('a')
for link in links:
    href = link.get('href')
    print("链接:", href)

requests.get(url)

sends a GET request and returns a response object. BeautifulSoup(response.text, 'html.parser') parses the HTML. soup.title.text gets the page title. soup.find_all('a') finds all anchor tags.

2. Using requests and lxml

2.1 Install dependencies

pip install requests lxml

2.2 Basic usage

import requests
from lxml import html

url = "https://quotes.toscrape.com/"
response = requests.get(url)

tree = html.fromstring(response.text)

quotes = tree.xpath('//div[@class="quote"]')
for quote in quotes:
    text = quote.xpath('.//span[@class="text"]/text()')[0]
    author = quote.xpath('.//small[@class="author"]/text()')[0]
    print(f"名言: {text}, 作者: {author}")

lxml

provides XPath support, allowing flexible element selection, especially useful for complex page structures.

3. Using Selenium for dynamic pages

3.1 Install dependencies

pip install selenium

3.2 Basic usage

from selenium import webdriver
from selenium.webdriver.common.by import By
import time

driver = webdriver.Chrome(executable_path='/path/to/chromedriver')
driver.get("https://quotes.toscrape.com/js/")
time.sleep(2)

quotes = driver.find_elements(By.CLASS_NAME, 'quote')
for quote in quotes:
    text = quote.find_element(By.CLASS_NAME, 'text').text
    author = quote.find_element(By.CLASS_NAME, 'author').text
    print(f"名言: {text}, 作者: {author}")

driver.quit()

Selenium can capture content rendered by JavaScript, simulating real browser actions such as clicking, scrolling, and form filling.

4. Using Scrapy framework

4.1 Install Scrapy

pip install scrapy

4.2 Create a Scrapy project

scrapy startproject myspider

4.3 Write a Scrapy Spider

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ['https://quotes.toscrape.com/']

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'text': quote.css('span.text::text').get(),
                'author': quote.css('small.author::text').get(),
            }
        next_page = response.css('li.next a::attr(href)').get()
        if next_page:
            yield response.follow(next_page, self.parse)

4.4 Run the spider

scrapy crawl quotes

Scrapy is a powerful framework for large‑scale crawling, supporting concurrency, automatic cookie handling, retries, pagination, and data export to JSON, CSV, or databases.

5. Other scraping methods

5.1 pyquery

pip install pyquery
from pyquery import PyQuery as pq

url = "https://quotes.toscrape.com/"
doc = pq(url)
for quote in doc('.quote').items():
    text = quote('.text').text()
    author = quote('.author').text()
    print(f"名言: {text}, 作者: {author}")

5.2 requests-html

pip install requests-html
from requests_html import HTMLSession

session = HTMLSession()
url = "https://quotes.toscrape.com/js/"
response = session.get(url)
response.html.render()
quotes = response.html.find('.quote')
for quote in quotes:
    text = quote.find('.text', first=True).text
    author = quote.find('.author', first=True).text
    print(f"名言: {text}, 作者: {author}")

Python offers multiple powerful web‑scraping techniques suitable for different types of sites: requests + BeautifulSoup for static pages, Selenium for JavaScript‑driven content, and Scrapy for large‑scale projects. Choosing the right tool enables efficient data collection for analysis, content aggregation, and many other applications.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Web Scraping Scrapy Selenium requests beautifulsoup

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.