Why DrissionPage Is the Game‑Changer for Python Web Scraping and Automation
This article introduces DrissionPage, a Python library that merges Selenium and Requests, explains its three page objects, highlights seamless mode switching, built‑in utilities, API changes in version 4.0, and provides practical code examples for web automation, data crawling, and testing.
Introduction
Previously I was doing full‑site crawling of eBay, Amazon, Taobao, TikTok, etc., but later realized that small‑scale data collection has higher demand, is simpler, and does not require breaking or large‑scale engineering. DrissionPage is a tool I often use, combined with a fingerprint browser, and I have abandoned Selenium.
DrissionPage Overview
DrissionPage is a Python web‑automation library that integrates Selenium and Requests, offering a unified and simple interface. Developers can switch freely between browser mode (like Selenium) and headless mode (like Requests). It handles both dynamic pages that need JavaScript rendering and static pages.
The tool provides three main page objects, each suited to different scenarios:
ChromiumPage : Directly controls a browser, suitable for interactions such as clicking buttons, entering text, or executing JavaScript. It is slower and more memory‑intensive.
WebPage : A hybrid object that can control the browser and send/receive packets. It has two modes: d (browser) and s (packet). The d mode is powerful but slower; the s mode is fast for simple packets.
SessionPage : A lightweight object for sending and receiving packets only, ideal for large‑scale crawling.
Key Features
Seamless mode switching
Developers can switch between Selenium‑driven browser and Requests‑driven session at will. For pages with both dynamic and static content, one can first fetch static data with SessionPage and then handle dynamic parts with ChromiumPage or WebPage in d mode.
Simplified API
DrissionPage offers a unified API that reduces the learning curve of Selenium and Requests. Element location uses ele() and eles() with CSS selectors, XPath, etc.
Flexible customization
Custom request headers, proxies, and timeouts can be set to bypass anti‑scraping mechanisms.
Built‑in utilities
Features such as element‑load waiting and automatic retries are included.
Multi‑tab handling
Multiple browser tabs can be operated simultaneously without switching focus.
Listen (packet capture) upgrade
In version 4.0 each page object has a listen attribute for packet monitoring. The old API ( FlowViewer, wait.data_packets(), etc.) has been removed and replaced with methods like listen.start(), listen.wait(), listen.steps(), and listen.wait_silent(). The result structure separates request and response data.
Old API changes
Removed FlowViewer, wait.set_targets(), wait.stop_listening(), wait.data_packets(), and related paths.
New API
Each page object now has a listen property.
Use listen.start() and listen.stop() to control listening.
Use listen.wait() to wait for packets, listen.steps() to retrieve results, and listen.wait_silent() to wait for all requests.
Example
The following script demonstrates the Listen feature and records execution time:
from DrissionPage import ChromiumPage
from TimePinner import Pinner
from pprint import pprint
page = ChromiumPage()
page.listen.start('api/getkeydata') # target to listen
pinner = Pinner(True, False)
page.get('http://www.hao123.com/') # open page
packet = page.listen.wait() # wait for packet
pprint(packet.response.body) # print packet body
pinner.pin('Time', True)Page access logic optimization
In version 3.x the get() method’s timeout only affected the loading stage and the none load strategy was ineffective. Version 4.0 resolves these issues, allowing users to control when to terminate connections and improving stability.
API changes
page_load_strategyrenamed to load_mode. set.load_strategy renamed to set.load_mode.
Behavior changes
timeoutnow applies to the entire process, including navigation. SessionPage and WebPage in s mode automatically retry on empty data. SessionPage can now open local files with get().
New load mode “none”
The previous “none” mode stopped loading as soon as the connection succeeded. The new behavior keeps loading until the page finishes or the user stops it manually, giving finer control.
Use Cases
Web automation testing
Using Selenium‑like capabilities to simulate user actions such as login, registration, and form submission.
Data crawling
Requests fetch static pages; switch to browser mode for complex pages.
Crawler development
Combine mode switching and powerful element location for efficient and stable crawlers.
Examples
Browser control
from DrissionPage import ChromiumPage
page = ChromiumPage()
page.get('https://gitee.com/login')
user_login = page.ele('#user_login')
user_login.input('your_account')
user_password = page.ele('#user_password')
user_password.input('your_password')
login_button = page.ele('@value=登录')
login_button.click()Data extraction
from DrissionPage import SessionPage
page = SessionPage()
for i in range(1, 4):
page.get(f'https://gitee.com/explore/all?page={i}')
links = page.eles('.title.project-namespace-path')
for link in links:
print(link.text, link.link)Page analysis
from DrissionPage import WebPage
page = WebPage()
page.get('https://gitee.com/explore/all')
page.change_mode() # switch mode
items = page.ele('.ui.relaxed.divided.items.explore-repo__list').eles('.item')
for item in items:
print(item('t:h3').text) # title
print(item('.project-desc.mb-1').text) # descriptionConclusion
DrissionPage is a powerful and easy‑to‑use open‑source Python package that provides an efficient and flexible solution for web automation and data extraction. By merging Selenium and Requests, it offers seamless mode switching and a simple API, allowing developers of any skill level to focus on business logic rather than low‑level details.
When using it, choose the appropriate page object and mode for your task, leverage its rich features, and respect website crawling policies.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
