Why DrissionPage Is the Game‑Changer for Python Web Scraping and Automation

This article introduces DrissionPage, a Python library that merges Selenium and Requests, explains its three page objects, highlights seamless mode switching, built‑in utilities, API changes in version 4.0, and provides practical code examples for web automation, data crawling, and testing.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Why DrissionPage Is the Game‑Changer for Python Web Scraping and Automation

Introduction

Previously I was doing full‑site crawling of eBay, Amazon, Taobao, TikTok, etc., but later realized that small‑scale data collection has higher demand, is simpler, and does not require breaking or large‑scale engineering. DrissionPage is a tool I often use, combined with a fingerprint browser, and I have abandoned Selenium.

DrissionPage Overview

DrissionPage is a Python web‑automation library that integrates Selenium and Requests, offering a unified and simple interface. Developers can switch freely between browser mode (like Selenium) and headless mode (like Requests). It handles both dynamic pages that need JavaScript rendering and static pages.

The tool provides three main page objects, each suited to different scenarios:

ChromiumPage : Directly controls a browser, suitable for interactions such as clicking buttons, entering text, or executing JavaScript. It is slower and more memory‑intensive.

WebPage : A hybrid object that can control the browser and send/receive packets. It has two modes: d (browser) and s (packet). The d mode is powerful but slower; the s mode is fast for simple packets.

SessionPage : A lightweight object for sending and receiving packets only, ideal for large‑scale crawling.

Key Features

Seamless mode switching

Developers can switch between Selenium‑driven browser and Requests‑driven session at will. For pages with both dynamic and static content, one can first fetch static data with SessionPage and then handle dynamic parts with ChromiumPage or WebPage in d mode.

Simplified API

DrissionPage offers a unified API that reduces the learning curve of Selenium and Requests. Element location uses ele() and eles() with CSS selectors, XPath, etc.

Flexible customization

Custom request headers, proxies, and timeouts can be set to bypass anti‑scraping mechanisms.

Built‑in utilities

Features such as element‑load waiting and automatic retries are included.

Multi‑tab handling

Multiple browser tabs can be operated simultaneously without switching focus.

Listen (packet capture) upgrade

In version 4.0 each page object has a listen attribute for packet monitoring. The old API ( FlowViewer, wait.data_packets(), etc.) has been removed and replaced with methods like listen.start(), listen.wait(), listen.steps(), and listen.wait_silent(). The result structure separates request and response data.

Old API changes

Removed FlowViewer, wait.set_targets(), wait.stop_listening(), wait.data_packets(), and related paths.

New API

Each page object now has a listen property.

Use listen.start() and listen.stop() to control listening.

Use listen.wait() to wait for packets, listen.steps() to retrieve results, and listen.wait_silent() to wait for all requests.

Example

The following script demonstrates the Listen feature and records execution time:

from DrissionPage import ChromiumPage
from TimePinner import Pinner
from pprint import pprint

page = ChromiumPage()
page.listen.start('api/getkeydata')  # target to listen
pinner = Pinner(True, False)
page.get('http://www.hao123.com/')  # open page
packet = page.listen.wait()          # wait for packet
pprint(packet.response.body)        # print packet body
pinner.pin('Time', True)

Page access logic optimization

In version 3.x the get() method’s timeout only affected the loading stage and the none load strategy was ineffective. Version 4.0 resolves these issues, allowing users to control when to terminate connections and improving stability.

API changes

page_load_strategy

renamed to load_mode. set.load_strategy renamed to set.load_mode.

Behavior changes

timeout

now applies to the entire process, including navigation. SessionPage and WebPage in s mode automatically retry on empty data. SessionPage can now open local files with get().

New load mode “none”

The previous “none” mode stopped loading as soon as the connection succeeded. The new behavior keeps loading until the page finishes or the user stops it manually, giving finer control.

Use Cases

Web automation testing

Using Selenium‑like capabilities to simulate user actions such as login, registration, and form submission.

Data crawling

Requests fetch static pages; switch to browser mode for complex pages.

Crawler development

Combine mode switching and powerful element location for efficient and stable crawlers.

Examples

Browser control

from DrissionPage import ChromiumPage

page = ChromiumPage()
page.get('https://gitee.com/login')
user_login = page.ele('#user_login')
user_login.input('your_account')
user_password = page.ele('#user_password')
user_password.input('your_password')
login_button = page.ele('@value=登录')
login_button.click()

Data extraction

from DrissionPage import SessionPage

page = SessionPage()
for i in range(1, 4):
    page.get(f'https://gitee.com/explore/all?page={i}')
    links = page.eles('.title.project-namespace-path')
    for link in links:
        print(link.text, link.link)

Page analysis

from DrissionPage import WebPage

page = WebPage()
page.get('https://gitee.com/explore/all')
page.change_mode()  # switch mode
items = page.ele('.ui.relaxed.divided.items.explore-repo__list').eles('.item')
for item in items:
    print(item('t:h3').text)          # title
    print(item('.project-desc.mb-1').text)  # description

Conclusion

DrissionPage is a powerful and easy‑to‑use open‑source Python package that provides an efficient and flexible solution for web automation and data extraction. By merging Selenium and Requests, it offers seamless mode switching and a simple API, allowing developers of any skill level to focus on business logic rather than low‑level details.

When using it, choose the appropriate page object and mode for your task, leverage its rich features, and respect website crawling policies.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Data Extractionweb-scrapingbrowser automationseleniumrequestsweb automationdrissionpage
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.