Backend Development 10 min read

How to Scrape NBA Player Stats from Hupu and Auto‑Generate Excel Charts with Python

This guide walks you through building a Python web‑scraper that extracts NBA player information from the Hupu website, cleans and visualizes the data, and automatically creates Excel files with embedded line charts, covering URL navigation, data parsing with requests and BeautifulSoup, and chart generation with xlsxwriter.

Python Crawling & Data Mining

Jun 25, 2021

How to Scrape NBA Player Stats from Hupu and Auto‑Generate Excel Charts with Python

Continuing the previous NBA scraper GUI, this article explains how to crawl player data from the Hupu NBA website, clean and visualize it, and automatically generate Excel files with line charts.

The tutorial is divided into two parts:

Scrape player data from the Hupu NBA site.

Clean and visualize the scraped data.

Python modules used: requests, pandas, bs4.

Scraping Part

Steps: observe the source of URL1 to find team names and their URLs (URL2), then player URLs (URL3), and finally extract basic info and game statistics.

Target URLs:

URL1: http://nba.hupu.com/players/ URL2 (example Lakers): https://nba.hupu.com/players/lakers URL3 (example LeBron James):

https://nba.hupu.com/players/lebronjames-650.html

from bs4 import BeautifulSoup
import requests
import xlsxwriter
import os

Using requests and bs4 to locate <span> and <a> tags, retrieve team and player URLs, and then fetch player details.

def Teamlists(url):
    TeamName=[]
    TeamURL=[]
    GET=requests.get(URL1)
    soup=BeautifulSoup(GET.content,'lxml')
    lables=soup.select('html body div div div ul li span a')
    for lable in lables:
        ballname=lable.get_text()
        TeamName.append(ballname)
        print(ballname)
    teamname=input("请输入想查询的球队名：")
    c=TeamName.index(teamname)
    for item in lables:
        HREF=item.get('href')
        TeamURL.append(HREF)
    URL2=TeamURL[c]
    return URL2

def playerlists(URL2):
    PlayerName=[]
    PlayerURL=[]
    GET2=requests.get(URL1)
    soup2=BeautifulSoup(GET2.content,'lxml')
    lables2=soup2.select('html body div div table tbody tr td b a')
    for lable2 in lables2:
        playername=lable2.get_text()
        PlayerName.append(playername)
        print(playername)
    name=input("请输入球员名：")
    d=PlayerName.index(name)
    for item2 in lables2:
        HREF2=item2.get('href')
        PlayerURL.append(HREF2)
    URL3=PlayerURL[d]
    return URL3,name

def Competition(URL3):
    data=[]
    GET3=requests.get(URL3)
    soup3=BeautifulSoup(GET3.content,'lxml')
    lables3=soup3.select('html body div div div div div div div div p')
    lables4=soup3.select('div div table tbody tr td')
    for lable3 in lables3:
        introduction=lable3.get_text()
        print(introduction)
    for lable4 in lables4:
        competition=lable4.get_text()
        data.append(competition)
    for i in range(len(data)):
        if data[i]=='职业生涯常规赛平均数据':
            a=data[i+31]
            a=data.index(a)
    del(data[:a])
    for x in range(len(data)):
        if data[x]=='职业生涯季后赛平均数据':
            b=data[x]
            b=data.index(b)
    del(data[b:])
    return data

Visualization Part

Creating a folder, writing data to Excel, and generating a line chart.

def file_add(path):
    creatpath=path+'\\Basketball'
    try:
        if not os.path.isdir(creatpath):
            os.makedirs(creatpath)
    except:
        print("文件夹存在")
    return creatpath

def player_chart(name,data,creatpath):
    EXCEL=xlsxwriter.Workbook(creatpath+'\\'+name+'chart.xlsx')
    worksheet=EXCEL.add_worksheet(name)
    bold=EXCEL.add_format({'bold':1})
    headings=data[:18]
    worksheet.write_row('A1',headings,bold)
    num=(len(data))//18
    a=0
    for i in range(num):
        a=a+18
        c=a+18
        i=i+1
        worksheet.write_row('A'+str(i+1),data[a:c])
    chart_col = EXCEL.add_chart({'type':'line'})
    chart_col.add_series({
        'name':'='+name+'!$R$1',
        'categories':'='+name+'!$A$2:$A$'+str(num),
        'values':'='+name+'!$R$2:$R$'+str(num-1),
        'line':{'color':'red'},
    })
    chart_col.set_title({'name':name+'生涯常规赛平均得分'})
    chart_col.set_x_axis({'name':'年份 (年)'})
    chart_col.set_y_axis({'name':'平均得分(分)'})
    chart_col.set_style(1)
    worksheet.insert_chart('A14',chart_col,{'x_offset':25,'y_offset':3})
    EXCEL.close()

Example output for LeBron James is shown below:

Opening the generated Excel file displays the line chart automatically, without further processing.

Combining the scraping and visualization steps provides real‑time regular‑season and playoff statistics with auto‑generated charts, ready to be bound to a GUI button.

Obtain standard names of all NBA teams.

Retrieve standard player names for a selected team.

Fetch basic information and regular‑season & playoff data for a selected player.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Data Visualization Web Scraping xlsxwriter beautifulsoup NBA

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.