Backend Development 8 min read

How to Scrape ETF Data with Python: Step-by-Step Code and Tips

This article walks through retrieving ETF fund codes and names from Eastmoney using Python's requests and pandas, explains constructing the correct URLs, handling pagination, cleaning the JSON response, and provides complete sample scripts, while also highlighting a simpler solution and recommending a data‑collection platform.

Python Crawling & Data Mining

Dec 27, 2023

How to Scrape ETF Data with Python: Step-by-Step Code and Tips

Introduction

A user asked how to construct the 13‑digit numeric part of an Eastmoney ETF URL and retrieve the fund code and name columns. The target URLs are of the form http://quote.eastmoney.com/center/gridlist.html#fund_etf and a JSON API endpoint with pagination parameters.

Implementation

Guidance from Wu Chaojian provided a basic script that fetches the data from a single page and extracts the desired fields using requests and pandas:

import requests,json
import pandas as pd

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:120.0) Gecko/20100101 Firefox/120.0'}
url = 'http://89.push2.eastmoney.com/api/qt/clist/get?cb=jQuery112406545446716331029_1703061927055&pn=3&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=b:MK0021,b:MK0022,b:MK0023,b:MK0024&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_='

The response is wrapped in a JSONP callback; the script strips the callback and parses the JSON, then normalizes the diff list into a DataFrame and selects the f12 (code) and f14 (name) columns.

An attempt to scrape multiple pages was made with a loop that generated the changing timestamp parameter and page numbers:

number = []
i = 0
n = 44
j = 1703054636319
while i < n:
    j += 5
    number.append(j)
    i += 1

df_all = []
for i,j in zip(range(1,45),number):
    url = f'http://85.push2.eastmoney.com/api/qt/clist/get?cb=jQuery1124030358799609457776_1703062450956&pn={i}&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=b:MK0021,b:MK0022,b:MK0023,b:MK0024&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_={j}'
    resp = requests.get(url,headers = headers,timeout =10).text
    table = resp.replace('jQuery112404551488490763843_1703043849281(','').replace(')','').replace(';','')
    df = pd.json_normalize(data = dict_data['data']['diff'])
    df_1 = df[['f12','f14']]
    df_all.append(df_1)
all_table = pd.concat(df_all)

Later, Kelly pointed out that all required data are actually present on the first page, making the pagination logic unnecessary.

A simplified request that works for the first page is:

url = "http://55.push2.eastmoney.com/api/qt/clist/get?cb=jQuery112402201018241113597_1703065790029&pn=1&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=b:MK0021,b:MK0022,b:MK0023,b:MK0024&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_="
headers = {
    'Referer': 'http://quote.eastmoney.com/center/gridlist.html',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'
}
res = requests.get(url, headers=headers)

This request returns the full list of ETF codes and names without needing additional pages.

Conclusion

The article demonstrates how to use Python, requests, and pandas to fetch ETF data from Eastmoney, shows an initial multi‑page approach, and then simplifies it by recognizing that the first page already contains all required records.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data extraction Web Scraping requests ETF

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.