Fundamentals 9 min read

Python-Based Fund Market Data Extraction and Analysis

This article demonstrates how to use Python to scrape fund data from Eastmoney, store it in Excel, visualize fund type distribution, and perform detailed time‑series analysis on a specific fund's net‑value, growth rate, and yearly performance.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Python-Based Fund Market Data Extraction and Analysis

The article begins with a background explaining that the goal is to extract and analyze fund data from the Chinese fund market to help beginners assess whether a fund is worth buying.

Data acquisition : Required packages such as pandas, requests, BeautifulSoup, numpy, and matplotlib are imported, and the Eastmoney JavaScript file is fetched to obtain all fund codes, names, and types, which are then saved to an Excel file.

import pandas as pd
import re
import numpy as np
from bs4 import BeautifulSoup
import requests
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['font.sans-serif']='SimHei'
plt.rcParams['axes.unicode_minus']=False

After loading the data into a DataFrame, a bar chart visualizes the distribution of fund types, showing that mixed, bond, open‑ended bond, money‑market, and stock‑index funds dominate the market.

Fund‑specific analysis : A function get_fund_data is defined to request historical net‑value data for a given fund code and date range, handling pagination and missing values, and returning a cleaned DataFrame.

def get_fund_data(code, per=10, sdate='', edate='', proxies=None):
    url = 'http://fund.eastmoney.com/f10/F10DataApi.aspx'
    params = {'type':'lsjz','code':code,'page':1,'per':per,'sdate':sdate,'edate':edate}
    # ... pagination loop ...
    return data

The script fetches data for the "招商中证白酒" fund (code 161725) from 2015‑01‑01 to 2020‑12‑22, then converts columns to appropriate types, removes the percentage sign from the daily growth rate, and adds a benchmark column.

data['净值日期'] = pd.to_datetime(data['净值日期'], format='%Y/%m/%d')
data['单位净值'] = data['单位净值'].astype(float)
# ... other conversions ...

Two sub‑plots are created: one showing unit and cumulative net values over time, and another showing daily growth rate together with a benchmark line.

fig = plt.figure(figsize=(16,10), dpi=240)
ax1 = fig.add_subplot(211)
ax1.plot(net_value_date, net_asset_value, label='基金净值')
# ... other plotting commands ...
plt.show()

Further analysis adds a "year" column, counts positive and negative daily growth days per year, calculates the proportion of positive days, and computes the average daily growth rate for each year, revealing that most years have more positive than negative days and that the fund generally trends upward.

data1['年'] = data1['净值日期'].dt.year
# ... groupby and aggregation ...

The conclusion notes that while the fund shows a long‑term upward trend and relatively stable daily growth, investors should still consider other factors and risk management before making a purchase.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonWeb ScrapingMatplotlibfinanceFund
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.