Backend Development 7 min read

How to Master Web Scraping with Playwright: A Step‑by‑Step Python Guide

This article walks you through using Playwright in Python to scrape the Xinfadi market website, showing how to capture URLs, handle request payloads, extract JSON data, and process results with a complete, runnable code example.

Python Crawling & Data Mining

Jul 10, 2025

How to Master Web Scraping with Playwright: A Step‑by‑Step Python Guide

1. Introduction

In this tutorial I share a practical Playwright‑based web crawler for Python. After a brief introduction I demonstrate the tool on the Xinfadi market website, showing that the approach is easy to pick up and can greatly improve scraping efficiency.

2. Implementation Process

We start with the Xinfadi homepage, which originally used a GET request and later switched to POST. By inspecting the network traffic we locate the data endpoint http://www.xinfadi.com.cn/getPriceData.html. The request URL remains constant while the payload changes, especially the current parameter that controls pagination.

By updating only the url and response.url in the Playwright script, we can iterate through pages and collect the JSON payload.

Below is the complete Playwright script. It launches a Chromium browser, intercepts requests and responses, filters the target URL, parses the JSON, and prints selected fields (id, product name, category, place).

from playwright.sync_api import Playwright, sync_playwright
import datetime
from pprint import pprint
import traceback
import logging
from tqdm import tqdm
import json

# pip install playwright && playwright install
"""
First write a normal login script, then add:
page.on("request", lambda request: handle(request=request, response=None))
page.on("response", lambda response: handle(response=response, request=None))
"""
logging.basicConfig(format='%(asctime)s | %(levelname)s : %(message)s', level=logging.INFO)

def handle_json(json):
    # process our json data
    for i in range(20):
        data_list = json['list'][i]
        id = data_list['id']
        prodName = data_list['prodName']
        prodCat = data_list['prodCat']
        place = data_list['place']
        print(id, prodName, prodCat, place)

def handle(request, response):
    if response is not None:
        # response url is the data endpoint
        if response.url == 'http://www.xinfadi.com.cn/getPriceData.html':
            handle_json(response.json())

def run(playwright: Playwright) -> None:
    browser = playwright.chromium.launch(headless=False)
    context = browser.new_context(ignore_https_errors=True)
    page = context.new_page()
    page.on("request", lambda request: handle(request=request, response=None))
    page.on("response", lambda response: handle(response=response, request=None))
    url = 'http://www.xinfadi.com.cn/index.html'
    page.goto(url)
    # scroll to load more data
    page.mouse.wheel(0, 300)
    page.wait_for_timeout(50000)
    context.close()
    page.close()
    browser.close()

with sync_playwright() as playwright:
    run(playwright)

Running the script prints the extracted records, confirming that the data has been successfully retrieved.

3. Conclusion

The tutorial demonstrates a functional Playwright web‑scraping case, covering request inspection, payload handling, and data extraction. Readers can adapt the script by changing the target url and the JSON processing logic to suit other sites.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Automation Data Extraction Web Scraping Playwright

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.