Big Data 5 min read

Scrape Shanghai Gold Exchange Data with Python: Step‑by‑Step Guide

This article walks through a step‑by‑step Python solution for scraping Shanghai Gold Exchange transaction data, using requests_html to collect URLs, pandas to parse HTML tables, and finally exporting the aggregated results to Excel, while also offering practical tips for sharing code and debugging.

Python Crawling & Data Mining

Dec 8, 2023

Scrape Shanghai Gold Exchange Data with Python: Step‑by‑Step Guide

Preface

Hello everyone, I'm Pi Pi. Continuing from the previous article, we explore the solutions shared by experts.

Implementation

Zheng Yuzhe and Yu Liang provided a hint, illustrated below:

Later, user "隔壁😼山楂" contributed the following Python code, which uses requests_html, fake_useragent, and pandas to crawl the Shanghai Gold Exchange website, collect transaction URLs, extract tables with pd.read_html, add timestamps, and save the combined data to an Excel file.

from requests_html import HTMLSession
from fake_useragent import UserAgent
import pandas as pd

session = HTMLSession()
ua = UserAgent().random

day_url_all = []  # 取当前网页下所有交易行情数据的url

for i in range(1, 2):  # 先爬取2页数据，待数据测试无问题后，在扩大数据爬取
    r = session.get(f"https://www.sge.com.cn/sjzx/mrhqsj?p={i}", headers={'user-agent': ua})
    # print(r.html)
    xpath_url = r.html.xpath("//ul/li/a[@class]")
    for i in xpath_url:
        # 获取当前网页下所有交易行情数据的url
        url = "https://www.sge.com.cn" + i.find('a[href]')[0].attrs.get('href').lstrip('.')
        day_url_all.append(url)

df_all = []
for i in day_url_all:
    r = session.get(i, headers={'user-agent': ua})
    data = pd.read_html(r.html.raw_html, header=0)[0]
    data['时间'] = r.html.xpath('//div[@class="title"]/p/span/text()')[0]
    df_all.append(data)

df_all = pd.concat(df_all)
df_all.to_excel("最终数据.xlsx")

The script successfully solved the follower's problem.

A reader asked whether any table‑formatted data on a webpage can be retrieved with pd.read_html. The answer confirms that this method works when the HTML contains recognizable tables.

Conclusion

This article demonstrates how to package and process data with Python, providing a complete analysis and code implementation to help readers solve similar web‑scraping tasks.

Tip: when asking questions, include a small demo dataset, copy‑able code, error screenshots, and if the code exceeds 50 lines, attach a .py file.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python pandas requests-html web-scraping data-mining

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.