How to Scrape ETF Data with Python: Step-by-Step Code and Tips
This article walks through retrieving ETF fund codes and names from Eastmoney using Python's requests and pandas, explains constructing the correct URLs, handling pagination, cleaning the JSON response, and provides complete sample scripts, while also highlighting a simpler solution and recommending a data‑collection platform.
Introduction
A user asked how to construct the 13‑digit numeric part of an Eastmoney ETF URL and retrieve the fund code and name columns. The target URLs are of the form http://quote.eastmoney.com/center/gridlist.html#fund_etf and a JSON API endpoint with pagination parameters.
Implementation
Guidance from Wu Chaojian provided a basic script that fetches the data from a single page and extracts the desired fields using requests and pandas:
import requests,json
import pandas as pd
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:120.0) Gecko/20100101 Firefox/120.0'}
url = 'http://89.push2.eastmoney.com/api/qt/clist/get?cb=jQuery112406545446716331029_1703061927055&pn=3&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=b:MK0021,b:MK0022,b:MK0023,b:MK0024&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_='The response is wrapped in a JSONP callback; the script strips the callback and parses the JSON, then normalizes the diff list into a DataFrame and selects the f12 (code) and f14 (name) columns.
An attempt to scrape multiple pages was made with a loop that generated the changing timestamp parameter and page numbers:
number = []
i = 0
n = 44
j = 1703054636319
while i < n:
j += 5
number.append(j)
i += 1
df_all = []
for i,j in zip(range(1,45),number):
url = f'http://85.push2.eastmoney.com/api/qt/clist/get?cb=jQuery1124030358799609457776_1703062450956&pn={i}&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=b:MK0021,b:MK0022,b:MK0023,b:MK0024&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_={j}'
resp = requests.get(url,headers = headers,timeout =10).text
table = resp.replace('jQuery112404551488490763843_1703043849281(','').replace(')','').replace(';','')
df = pd.json_normalize(data = dict_data['data']['diff'])
df_1 = df[['f12','f14']]
df_all.append(df_1)
all_table = pd.concat(df_all)Later, Kelly pointed out that all required data are actually present on the first page, making the pagination logic unnecessary.
A simplified request that works for the first page is:
url = "http://55.push2.eastmoney.com/api/qt/clist/get?cb=jQuery112402201018241113597_1703065790029&pn=1&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=b:MK0021,b:MK0022,b:MK0023,b:MK0024&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_="
headers = {
'Referer': 'http://quote.eastmoney.com/center/gridlist.html',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'
}
res = requests.get(url, headers=headers)This request returns the full list of ETF codes and names without needing additional pages.
Conclusion
The article demonstrates how to use Python, requests, and pandas to fetch ETF data from Eastmoney, shows an initial multi‑page approach, and then simplifies it by recognizing that the first page already contains all required records.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
