How to Scrape Xinfadi Market Data with Playwright and Python
Learn how to use Python's Playwright library to scrape dynamic data from the Xinfadi market website, covering request analysis, URL handling, payload adjustments, and a complete code example that extracts product information, demonstrating a practical approach to web crawling and data extraction.
1. Introduction
I'm a Python enthusiast sharing a practical Playwright web‑scraping tutorial. The example targets the Xinfadi market site, which originally used GET requests and later switched to POST, making conventional crawling harder.
2. Implementation Process
The tutorial walks through inspecting the site, locating the data URL, and observing that the request URL remains constant while the payload's current parameter changes across pages. By capturing these requests with Playwright, we can retrieve the JSON data.
The following code launches a Chromium browser, intercepts both request and response events, filters the response URL http://www.xinfadi.com.cn/getPriceData.html, and processes the JSON payload to extract fields such as id, prodName, prodCat, and place.
from playwright.sync_api import Playwright, sync_playwright
import datetime
from pprint import pprint
import traceback
import logging
from tqdm import tqdm
import json
# pip install playwright && playwright install
logging.basicConfig(format='%(asctime)s | %(levelname)s : %(message)s', level=logging.INFO)
def handle_json(json_data):
# Process the JSON data
for i in range(20):
item = json_data['list'][i]
id = item['id']
prodName = item['prodName']
prodCat = item['prodCat']
place = item['place']
print(id, prodName, prodCat, place)
def handle(request, response):
if response is not None:
# response.url is the data request URL
if response.url == 'http://www.xinfadi.com.cn/getPriceData.html':
handle_json(response.json())
def run(playwright: Playwright) -> None:
browser = playwright.chromium.launch(headless=False)
context = browser.new_context(ignore_https_errors=True)
page = context.new_page()
page.on("request", lambda request: handle(request=request, response=None))
page.on("response", lambda response: handle(response=response, request=None))
url = 'http://www.xinfadi.com.cn/index.html'
page.goto(url)
page.wait_for_timeout(50000)
context.close()
page.close()
browser.close()
with sync_playwright() as playwright:
run(playwright)Running the script prints the extracted product information. The handle_json function can be extended to store or further process the data as needed.
3. Summary
This article demonstrates a complete Playwright‑based web‑scraping workflow for the Xinfadi market, showing how to analyze request patterns, capture dynamic JSON responses, and extract useful fields with Python.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
