How to Scrape NBA Player Stats from Hupu and Auto‑Generate Excel Charts with Python
This guide walks you through building a Python web‑scraper that extracts NBA player information from the Hupu website, cleans and visualizes the data, and automatically creates Excel files with embedded line charts, covering URL navigation, data parsing with requests and BeautifulSoup, and chart generation with xlsxwriter.
Continuing the previous NBA scraper GUI, this article explains how to crawl player data from the Hupu NBA website, clean and visualize it, and automatically generate Excel files with line charts.
The tutorial is divided into two parts:
Scrape player data from the Hupu NBA site.
Clean and visualize the scraped data.
Python modules used: requests, pandas, bs4.
Scraping Part
Steps: observe the source of URL1 to find team names and their URLs (URL2), then player URLs (URL3), and finally extract basic info and game statistics.
Target URLs:
URL1: http://nba.hupu.com/players/ URL2 (example Lakers): https://nba.hupu.com/players/lakers URL3 (example LeBron James):
https://nba.hupu.com/players/lebronjames-650.html from bs4 import BeautifulSoup
import requests
import xlsxwriter
import osUsing requests and bs4 to locate <span> and <a> tags, retrieve team and player URLs, and then fetch player details.
def Teamlists(url):
TeamName=[]
TeamURL=[]
GET=requests.get(URL1)
soup=BeautifulSoup(GET.content,'lxml')
lables=soup.select('html body div div div ul li span a')
for lable in lables:
ballname=lable.get_text()
TeamName.append(ballname)
print(ballname)
teamname=input("请输入想查询的球队名:")
c=TeamName.index(teamname)
for item in lables:
HREF=item.get('href')
TeamURL.append(HREF)
URL2=TeamURL[c]
return URL2 def playerlists(URL2):
PlayerName=[]
PlayerURL=[]
GET2=requests.get(URL1)
soup2=BeautifulSoup(GET2.content,'lxml')
lables2=soup2.select('html body div div table tbody tr td b a')
for lable2 in lables2:
playername=lable2.get_text()
PlayerName.append(playername)
print(playername)
name=input("请输入球员名:")
d=PlayerName.index(name)
for item2 in lables2:
HREF2=item2.get('href')
PlayerURL.append(HREF2)
URL3=PlayerURL[d]
return URL3,name def Competition(URL3):
data=[]
GET3=requests.get(URL3)
soup3=BeautifulSoup(GET3.content,'lxml')
lables3=soup3.select('html body div div div div div div div div p')
lables4=soup3.select('div div table tbody tr td')
for lable3 in lables3:
introduction=lable3.get_text()
print(introduction)
for lable4 in lables4:
competition=lable4.get_text()
data.append(competition)
for i in range(len(data)):
if data[i]=='职业生涯常规赛平均数据':
a=data[i+31]
a=data.index(a)
del(data[:a])
for x in range(len(data)):
if data[x]=='职业生涯季后赛平均数据':
b=data[x]
b=data.index(b)
del(data[b:])
return dataVisualization Part
Creating a folder, writing data to Excel, and generating a line chart.
def file_add(path):
creatpath=path+'\\Basketball'
try:
if not os.path.isdir(creatpath):
os.makedirs(creatpath)
except:
print("文件夹存在")
return creatpath def player_chart(name,data,creatpath):
EXCEL=xlsxwriter.Workbook(creatpath+'\\'+name+'chart.xlsx')
worksheet=EXCEL.add_worksheet(name)
bold=EXCEL.add_format({'bold':1})
headings=data[:18]
worksheet.write_row('A1',headings,bold)
num=(len(data))//18
a=0
for i in range(num):
a=a+18
c=a+18
i=i+1
worksheet.write_row('A'+str(i+1),data[a:c])
chart_col = EXCEL.add_chart({'type':'line'})
chart_col.add_series({
'name':'='+name+'!$R$1',
'categories':'='+name+'!$A$2:$A$'+str(num),
'values':'='+name+'!$R$2:$R$'+str(num-1),
'line':{'color':'red'},
})
chart_col.set_title({'name':name+'生涯常规赛平均得分'})
chart_col.set_x_axis({'name':'年份 (年)'})
chart_col.set_y_axis({'name':'平均得分(分)'})
chart_col.set_style(1)
worksheet.insert_chart('A14',chart_col,{'x_offset':25,'y_offset':3})
EXCEL.close()Example output for LeBron James is shown below:
Opening the generated Excel file displays the line chart automatically, without further processing.
Combining the scraping and visualization steps provides real‑time regular‑season and playoff statistics with auto‑generated charts, ready to be bound to a GUI button.
Obtain standard names of all NBA teams.
Retrieve standard player names for a selected team.
Fetch basic information and regular‑season & playoff data for a selected player.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
