Backend Development 6 min read

How to Scrape Xinfadi Market Data with Playwright and Python

Learn how to use Python's Playwright library to scrape dynamic data from the Xinfadi market website, covering request analysis, URL handling, payload adjustments, and a complete code example that extracts product information, demonstrating a practical approach to web crawling and data extraction.

Python Crawling & Data Mining

Jul 25, 2022

How to Scrape Xinfadi Market Data with Playwright and Python

1. Introduction

I'm a Python enthusiast sharing a practical Playwright web‑scraping tutorial. The example targets the Xinfadi market site, which originally used GET requests and later switched to POST, making conventional crawling harder.

2. Implementation Process

The tutorial walks through inspecting the site, locating the data URL, and observing that the request URL remains constant while the payload's current parameter changes across pages. By capturing these requests with Playwright, we can retrieve the JSON data.

The following code launches a Chromium browser, intercepts both request and response events, filters the response URL http://www.xinfadi.com.cn/getPriceData.html, and processes the JSON payload to extract fields such as id, prodName, prodCat, and place.

from playwright.sync_api import Playwright, sync_playwright
import datetime
from pprint import pprint
import traceback
import logging
from tqdm import tqdm
import json

# pip install playwright && playwright install
logging.basicConfig(format='%(asctime)s | %(levelname)s : %(message)s', level=logging.INFO)

def handle_json(json_data):
    # Process the JSON data
    for i in range(20):
        item = json_data['list'][i]
        id = item['id']
        prodName = item['prodName']
        prodCat = item['prodCat']
        place = item['place']
        print(id, prodName, prodCat, place)

def handle(request, response):
    if response is not None:
        # response.url is the data request URL
        if response.url == 'http://www.xinfadi.com.cn/getPriceData.html':
            handle_json(response.json())

def run(playwright: Playwright) -> None:
    browser = playwright.chromium.launch(headless=False)
    context = browser.new_context(ignore_https_errors=True)
    page = context.new_page()
    page.on("request", lambda request: handle(request=request, response=None))
    page.on("response", lambda response: handle(response=response, request=None))
    url = 'http://www.xinfadi.com.cn/index.html'
    page.goto(url)
    page.wait_for_timeout(50000)
    context.close()
    page.close()
    browser.close()

with sync_playwright() as playwright:
    run(playwright)

Running the script prints the extracted product information. The handle_json function can be extended to store or further process the data as needed.

3. Summary

This article demonstrates a complete Playwright‑based web‑scraping workflow for the Xinfadi market, showing how to analyze request patterns, capture dynamic JSON responses, and extract useful fields with Python.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

automation Playwright

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.