Backend Development 6 min read

Scraping and Documenting King of Glory Item Data with Python

This tutorial demonstrates how to fetch, clean, and organize King of Glory item data via a public JSON API using Python requests and pandas, then download item images concurrently, and finally generate comprehensive Markdown, Excel, and other documentation formats for easy reference.

Python Programming Learning Circle

Apr 13, 2022

Scraping and Documenting King of Glory Item Data with Python

This article explains how to collect and process item data from the game King of Glory (王者荣耀) using Python.

Fetching Item Data

The item list can be retrieved from the endpoint https://pvp.qq.com/web201605/js/item.json using the requests library with appropriate headers, then loaded into a pandas DataFrame for sorting and cleaning.

import requests
import pandas as pd

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36 '
}

target = 'https://pvp.qq.com/web201605/js/item.json'
item_list = requests.get(target, headers=headers).json()
item_df = pd.DataFrame(item_list)
item_df.sort_values(["item_type", "price", "item_id"], inplace=True)
item_df.fillna("", inplace=True)
item_df.des1 = item_df.des1.str.replace("</?p>", "", regex=True)
item_df.des2 = item_df.des2.str.replace("</?p>", "", regex=True)
item_df

Item images follow a predictable URL pattern where the item_id corresponds to the image filename, e.g., https://game.gtimg.cn/images/yxzj/img201606/itemimg/1111.jpg.

Multithreaded Image Download

Using ThreadPoolExecutor, the script downloads all item images in parallel, storing them under an imgs directory.

import os
from concurrent.futures import ThreadPoolExecutor

def download_img(item_id):
    if os.path.exists(f"imgs/{item_id}.jpg"):
        return
    imgurl = f"http://game.gtimg.cn/images/yxzj/img201606/itemimg/{item_id}.jpg"
    res = requests.get(imgurl)
    with open(f"imgs/{item_id}.jpg", "wb") as f:
        f.write(res.content)

os.makedirs("imgs", exist_ok=True)
with ThreadPoolExecutor(max_workers=8) as executor:
    nums = executor.map(download_img, item_df.item_id)

The images are downloaded within seconds.

Generating Markdown Documentation

The cleaned DataFrame is transformed so that each item’s image is embedded using Markdown syntax, and the item type codes are mapped to readable Chinese labels.

item_type_dict = {1: '攻击', 2: '法术', 3: '防御', 4: '移动', 5: '打野', 7: '游走'}
item_ids = item_df.item_id.values
item_df.item_id = item_df.item_id.apply(lambda item_id: f"![{item_id}](imgs/{item_id}.jpg)")
item_df.item_type = item_df.item_type.map(item_type_dict)
item_df.columns = ["图片", "装备名称", "类型", "售价", "总价", "基础描述", "扩展描述"]
item_df

Markdown files are created per item type, writing tables with to_markdown.

with open("王者装备说明.md", "w") as f:
    for item_type, item_split in item_df.groupby("类型", sort=False):
        f.write(f"# {item_type}
")
        item_split.drop(columns="类型", inplace=True)
        f.write(item_split.to_markdown(index=False))
        f.write("

")

Generating Excel Documentation

After further cleaning of description fields, the DataFrame is exported to an Excel workbook with images embedded using openpyxl.

item_df.图片 = ""
item_df.基础描述 = item_df.基础描述.str.replace("<br>", "
")
item_df.扩展描述 = item_df.扩展描述.str.replace("<br>", "
")
item_df

from openpyxl.drawing.image import Image
from openpyxl.styles import Alignment

with pd.ExcelWriter("王者装备说明.xlsx", engine='openpyxl') as writer:
    item_df.to_excel(writer, sheet_name='装备说明', index=False)
    worksheet = writer.sheets['装备说明']
    worksheet.column_dimensions["A"].width = 11
    for item_id, (cell,) in zip(item_ids, worksheet.iter_rows(2, None, 1, 1)):
        worksheet.row_dimensions[cell.row].height = 67
        worksheet.add_image(Image(f"imgs/{item_id}.jpg"), f'A{cell.row}')
    worksheet.column_dimensions["F"].width = 15
    worksheet.column_dimensions["G"].width = 35
    writer.save()

The resulting Excel file contains item details with embedded images, and the workflow can be adapted to produce HTML, Word, or other formats.

Disclaimer: This content is compiled from online sources; copyright belongs to the original author.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

multithreading Excel Pandas Markdown web-scraping

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.