Backend Development 8 min read

Python Web Scraping Tutorial: Fetching ETF Data from Eastmoney

This article walks through a Python web‑scraping tutorial that extracts ETF codes and names from Eastmoney, detailing the problem, step‑by‑step code using requests, pandas, and handling pagination, and concludes with a brief promotion of a data‑collection service.

Python Programming Learning Circle

Jan 10, 2024

Python Web Scraping Tutorial: Fetching ETF Data from Eastmoney

The author presents a question from a Python community about how to construct a 13‑digit identifier to retrieve ETF fund data (code and name) from Eastmoney. The required data is available via a URL that includes pagination and a dynamic numeric parameter.

Initial guidance is provided with a simple script that fetches the first page using requests and parses the JSON response with pandas. The code snippet is:

import requests,json
import pandas as pd

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:120.0) Gecko/20100101 Firefox/120.0'}
url = 'http://89.push2.eastmoney.com/api/qt/clist/get?cb=jQuery112406545446716331029_1703061927055&pn=3&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=b:MK0021,b:MK0022,b:MK0023,b:MK0024&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1703061927065'
resp = requests.get(url, headers=headers, timeout=10).text
table = resp.replace('jQuery112406545446716331029_1703061927055(','').replace(')','').replace(';','')
dict_data = json.loads(table)
df = pd.json_normalize(data=dict_data['data']['diff'])
df[['f12','f14']]

Attempting to scrape multiple pages, the author tries a loop that generates the required 13‑digit numbers and constructs URLs for each page. The corresponding code is:

number = []
i = 0
n = 44
j = 1703054636319
while i < n:
    j += 5
    number.append(j)
    i += 1
df_all = []
for i, j in zip(range(1, 45), number):
    url = f'http://85.push2.eastmoney.com/api/qt/clist/get?cb=jQuery1124030358799609457776_1703062450956&pn={i}&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=b:MK0021,b:MK0022,b:MK0023,b:MK0024&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_{j}'
    resp = requests.get(url, headers=headers, timeout=10).text
    table = resp.replace('jQuery112404551488490763843_1703043849281(','').replace(')','').replace(';','')
    df = pd.json_normalize(data=dict_data['data']['diff'])
    df_1 = df[['f12','f14']]
    df_all.append(df_1)
all_table = pd.concat(df_all)

Another participant points out that all needed data is actually present on the first page, simplifying the task. A final, working example uses a single request with proper headers and optional proxy settings:

url = "http://55.push2.eastmoney.com/api/qt/clist/get?cb=jQuery112402201018241113597_1703065790029&pn=1&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=b:MK0021,b:MK0022,b:MK0023,b:MK0024&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1703065790075"

headers = {
    'Referer': 'http://quote.eastmoney.com/center/gridlist.html',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'
}

proxies = {'http': '', 'https': ''}

res = requests.get(url, headers=headers, proxies=proxies)

After running the correct script, the fan’s problem is solved and the desired ETF list is obtained.

The article then shifts to promote the BrightData (formerly Luminati) data‑collection platform, providing a registration link, screenshots of the web IDE, and instructions for obtaining free Python learning resources via QR codes.

In summary, the piece demonstrates how to retrieve ETF information from Eastmoney using Python, highlights common pitfalls with pagination parameters, and offers a ready‑to‑use solution while also advertising a commercial web‑scraping service.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python data extraction Web Scraping ETF

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.