Automate Shanghai Stock Exchange Report Downloads with a Python Web Scraper

This tutorial shows how to use Python's requests and JSON handling to crawl the Shanghai Stock Exchange website, extract periodic report metadata, construct PDF URLs, and automatically download the files, providing a practical example of backend web‑scraping automation while warning against excessive server load.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Automate Shanghai Stock Exchange Report Downloads with a Python Web Scraper

Introduction

In this tutorial the author demonstrates how to use Python to crawl the Shanghai Stock Exchange website and automatically download periodic report PDF files.

Background

The SSE site provides downloadable announcement PDFs for listed companies. Manually checking each announcement is tedious, so a Python web scraper can automate the process.

Goal

Enter a start date and retrieve the SSE periodic reports for that period.

Implementation

The script uses requests to send a GET request, extracts the JSON payload, parses it with json, and iterates over each report to construct the PDF URL and download the file.

# coding: utf-8
from bs4 import BeautifulSoup
import requests
import json

def get_and_download_pdf_file():
    url = 'http://query.sse.com.cn/commonQuery.do?...'  # shortened for brevity
    referer = 'http://www.sse.com.cn/'
    response = requests.get(url=url, headers={'Referer': referer})
    json_data = response.text.split('(')[-1].replace(')', '')
    format_data = json.loads(json_data)
    for every_report in format_data['result']:
        pdf_url = 'http://static.sse.com.cn' + every_report['URL'].split('<br>')[0]
        file_name = every_report['TITLE'].split('<br>')[0] + '.pdf'
        pdf_file = requests.get(pdf_url, stream=True)
        with open(file_name, 'wb') as f:
            for chunk in pdf_file.iter_content(1024):
                f.write(chunk)
        print(f'上市公司报告:{file_name} 已经完成下载')

Result

Running the script with the desired page count produces a list of downloaded PDFs, as shown below.

Conclusion

The Python scraper efficiently fetches the latest SSE periodic reports, but users should avoid excessive requests to prevent server overload.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Web ScrapingSSEPDF download
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.