Automate Shanghai Stock Exchange Report Downloads with a Python Web Scraper
This tutorial shows how to use Python's requests and JSON handling to crawl the Shanghai Stock Exchange website, extract periodic report metadata, construct PDF URLs, and automatically download the files, providing a practical example of backend web‑scraping automation while warning against excessive server load.
Introduction
In this tutorial the author demonstrates how to use Python to crawl the Shanghai Stock Exchange website and automatically download periodic report PDF files.
Background
The SSE site provides downloadable announcement PDFs for listed companies. Manually checking each announcement is tedious, so a Python web scraper can automate the process.
Goal
Enter a start date and retrieve the SSE periodic reports for that period.
Implementation
The script uses requests to send a GET request, extracts the JSON payload, parses it with json, and iterates over each report to construct the PDF URL and download the file.
# coding: utf-8
from bs4 import BeautifulSoup
import requests
import json
def get_and_download_pdf_file():
url = 'http://query.sse.com.cn/commonQuery.do?...' # shortened for brevity
referer = 'http://www.sse.com.cn/'
response = requests.get(url=url, headers={'Referer': referer})
json_data = response.text.split('(')[-1].replace(')', '')
format_data = json.loads(json_data)
for every_report in format_data['result']:
pdf_url = 'http://static.sse.com.cn' + every_report['URL'].split('<br>')[0]
file_name = every_report['TITLE'].split('<br>')[0] + '.pdf'
pdf_file = requests.get(pdf_url, stream=True)
with open(file_name, 'wb') as f:
for chunk in pdf_file.iter_content(1024):
f.write(chunk)
print(f'上市公司报告:{file_name} 已经完成下载')Result
Running the script with the desired page count produces a list of downloaded PDFs, as shown below.
Conclusion
The Python scraper efficiently fetches the latest SSE periodic reports, but users should avoid excessive requests to prevent server overload.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
