Big Data 7 min read

How to Scrape and Process Chinese Stock Flow Data with Python

This guide walks you through using Python to locate the API endpoint for Eastmoney sector capital flow, send HTTP requests, clean the returned string into proper JSON, convert it to a Pandas DataFrame, and finally save the data locally for further analysis.

Python Crawling & Data Mining

Sep 1, 2022

How to Scrape and Process Chinese Stock Flow Data with Python

1. Locate the API endpoint

Open the target page http://data.eastmoney.com/bkzj/hy.html in a browser, press F12 to open the developer tools, and inspect the network requests to find the JavaScript file that loads the data. The relevant request URL looks like:

http://push2.eastmoney.com/api/qt/clist/get?cb=jQuery112309073354919152763_1617455258434&pn=1&pz=500&po=1&np=1&fields=f12%2Cf13%2Cf14%2Cf62&fid=f62&fs=m%3A90%2Bt%3A2&ut=b2884a393a59ad64002292a3e90d46a5&_=1617455258435

2. Send the request and check the response

Use the requests library to fetch the data. The response status code should be 200, indicating a successful request.

# coding=utf-8
import requests
url = "http://push2.eastmoney.com/api/qt/clist/get?cb=jQuery112309073354919152763_1617455258436&fid=f62&po=1&pz=50&pn=1&np=1&fltt=2&invt=2&ut=b2884a393a59ad64002292a3e90d46a5&fs=m%3A90+t%3A2&fields=f12%2Cf14%2Cf2%2Cf3%2Cf62%2Cf184%2Cf66%2Cf69%2Cf72%2Cf75%2Cf78%2Cf81%2Cf84%2Cf87%2Cf204%2Cf205%2Cf124"
r = requests.get(url)
print(r.status_code)  # 200
print(r.text)          # raw data

3. Clean the string and convert to JSON

The returned text is a JSON string prefixed with a callback function name. Remove the prefix and the trailing semicolon, then load it with json.loads. Finally, rename the columns for readability.

r_text = r.text.split("jQuery112309073354919152763_1617455258436")[1]
r_text_qu = r_text.rstrip(';')
import json, pandas as pd
r_json = json.loads(r_text_qu[1:-1])['data']['diff']
col_map = {"f12":"code","f2":"price","f3":"change","f14":"name","f62":"net_inflow","f66":"excess_inflow","f69":"excess_ratio","f72":"large_inflow","f75":"large_ratio","f78":"mid_inflow","f81":"mid_ratio","f84":"small_inflow","f87":"small_ratio","f124":"unknown","f184":"main_ratio"}
result = pd.DataFrame(r_json).rename(columns=col_map)
# Convert units to hundred‑million and keep only positive net inflow rows
for col in ["net_inflow","excess_inflow","large_inflow","mid_inflow","small_inflow"]:
    result[col] = round(result[col] / 100000000, 2)
result = result[result["net_inflow"] > 0]
print(result)

4. Save the data

Store the cleaned DataFrame to a CSV file using to_csv().

result.to_csv('sector_capital_flow.csv', index=False)

Summary

JSON is a lightweight data‑exchange format widely used by websites. By stripping the callback prefix, converting the string to a proper JSON object, and loading it into Pandas, you can efficiently obtain, clean, and store financial sector capital‑flow data for further quantitative analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data mining financial data

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.