How to Scrape and Process Chinese Stock Flow Data with Python
This guide walks you through using Python to locate the API endpoint for Eastmoney sector capital flow, send HTTP requests, clean the returned string into proper JSON, convert it to a Pandas DataFrame, and finally save the data locally for further analysis.
1. Locate the API endpoint
Open the target page http://data.eastmoney.com/bkzj/hy.html in a browser, press F12 to open the developer tools, and inspect the network requests to find the JavaScript file that loads the data. The relevant request URL looks like:
http://push2.eastmoney.com/api/qt/clist/get?cb=jQuery112309073354919152763_1617455258434&pn=1&pz=500&po=1&np=1&fields=f12%2Cf13%2Cf14%2Cf62&fid=f62&fs=m%3A90%2Bt%3A2&ut=b2884a393a59ad64002292a3e90d46a5&_=16174552584352. Send the request and check the response
Use the requests library to fetch the data. The response status code should be 200, indicating a successful request.
# coding=utf-8
import requests
url = "http://push2.eastmoney.com/api/qt/clist/get?cb=jQuery112309073354919152763_1617455258436&fid=f62&po=1&pz=50&pn=1&np=1&fltt=2&invt=2&ut=b2884a393a59ad64002292a3e90d46a5&fs=m%3A90+t%3A2&fields=f12%2Cf14%2Cf2%2Cf3%2Cf62%2Cf184%2Cf66%2Cf69%2Cf72%2Cf75%2Cf78%2Cf81%2Cf84%2Cf87%2Cf204%2Cf205%2Cf124"
r = requests.get(url)
print(r.status_code) # 200
print(r.text) # raw data3. Clean the string and convert to JSON
The returned text is a JSON string prefixed with a callback function name. Remove the prefix and the trailing semicolon, then load it with json.loads. Finally, rename the columns for readability.
r_text = r.text.split("jQuery112309073354919152763_1617455258436")[1]
r_text_qu = r_text.rstrip(';')
import json, pandas as pd
r_json = json.loads(r_text_qu[1:-1])['data']['diff']
col_map = {"f12":"code","f2":"price","f3":"change","f14":"name","f62":"net_inflow","f66":"excess_inflow","f69":"excess_ratio","f72":"large_inflow","f75":"large_ratio","f78":"mid_inflow","f81":"mid_ratio","f84":"small_inflow","f87":"small_ratio","f124":"unknown","f184":"main_ratio"}
result = pd.DataFrame(r_json).rename(columns=col_map)
# Convert units to hundred‑million and keep only positive net inflow rows
for col in ["net_inflow","excess_inflow","large_inflow","mid_inflow","small_inflow"]:
result[col] = round(result[col] / 100000000, 2)
result = result[result["net_inflow"] > 0]
print(result)4. Save the data
Store the cleaned DataFrame to a CSV file using to_csv().
result.to_csv('sector_capital_flow.csv', index=False)Summary
JSON is a lightweight data‑exchange format widely used by websites. By stripping the callback prefix, converting the string to a proper JSON object, and loading it into Pandas, you can efficiently obtain, clean, and store financial sector capital‑flow data for further quantitative analysis.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
