Scraping and Documenting King of Glory Item Data with Python
This tutorial demonstrates how to fetch, clean, and organize King of Glory item data via a public JSON API using Python requests and pandas, then download item images concurrently, and finally generate comprehensive Markdown, Excel, and other documentation formats for easy reference.
This article explains how to collect and process item data from the game King of Glory (王者荣耀) using Python.
Fetching Item Data
The item list can be retrieved from the endpoint https://pvp.qq.com/web201605/js/item.json using the requests library with appropriate headers, then loaded into a pandas DataFrame for sorting and cleaning.
<code>import requests
import pandas as pd
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36 '
}
target = 'https://pvp.qq.com/web201605/js/item.json'
item_list = requests.get(target, headers=headers).json()
item_df = pd.DataFrame(item_list)
item_df.sort_values(["item_type", "price", "item_id"], inplace=True)
item_df.fillna("", inplace=True)
item_df.des1 = item_df.des1.str.replace("</?p>", "", regex=True)
item_df.des2 = item_df.des2.str.replace("</?p>", "", regex=True)
item_df</code>Item images follow a predictable URL pattern where the item_id corresponds to the image filename, e.g., https://game.gtimg.cn/images/yxzj/img201606/itemimg/1111.jpg .
Multithreaded Image Download
Using ThreadPoolExecutor , the script downloads all item images in parallel, storing them under an imgs directory.
<code>import os
from concurrent.futures import ThreadPoolExecutor
def download_img(item_id):
if os.path.exists(f"imgs/{item_id}.jpg"):
return
imgurl = f"http://game.gtimg.cn/images/yxzj/img201606/itemimg/{item_id}.jpg"
res = requests.get(imgurl)
with open(f"imgs/{item_id}.jpg", "wb") as f:
f.write(res.content)
os.makedirs("imgs", exist_ok=True)
with ThreadPoolExecutor(max_workers=8) as executor:
nums = executor.map(download_img, item_df.item_id)
</code>The images are downloaded within seconds.
Generating Markdown Documentation
The cleaned DataFrame is transformed so that each item’s image is embedded using Markdown syntax, and the item type codes are mapped to readable Chinese labels.
<code>item_type_dict = {1: '攻击', 2: '法术', 3: '防御', 4: '移动', 5: '打野', 7: '游走'}
item_ids = item_df.item_id.values
item_df.item_id = item_df.item_id.apply(lambda item_id: f"")
item_df.item_type = item_df.item_type.map(item_type_dict)
item_df.columns = ["图片", "装备名称", "类型", "售价", "总价", "基础描述", "扩展描述"]
item_df
</code>Markdown files are created per item type, writing tables with to_markdown .
<code>with open("王者装备说明.md", "w") as f:
for item_type, item_split in item_df.groupby("类型", sort=False):
f.write(f"# {item_type}\n")
item_split.drop(columns="类型", inplace=True)
f.write(item_split.to_markdown(index=False))
f.write("\n\n")
</code>Generating Excel Documentation
After further cleaning of description fields, the DataFrame is exported to an Excel workbook with images embedded using openpyxl .
<code>item_df.图片 = ""
item_df.基础描述 = item_df.基础描述.str.replace("<br>", "\n")
item_df.扩展描述 = item_df.扩展描述.str.replace("<br>", "\n")
item_df
</code> <code>from openpyxl.drawing.image import Image
from openpyxl.styles import Alignment
with pd.ExcelWriter("王者装备说明.xlsx", engine='openpyxl') as writer:
item_df.to_excel(writer, sheet_name='装备说明', index=False)
worksheet = writer.sheets['装备说明']
worksheet.column_dimensions["A"].width = 11
for item_id, (cell,) in zip(item_ids, worksheet.iter_rows(2, None, 1, 1)):
worksheet.row_dimensions[cell.row].height = 67
worksheet.add_image(Image(f"imgs/{item_id}.jpg"), f'A{cell.row}')
worksheet.column_dimensions["F"].width = 15
worksheet.column_dimensions["G"].width = 35
writer.save()
</code>The resulting Excel file contains item details with embedded images, and the workflow can be adapted to produce HTML, Word, or other formats.
Disclaimer: This content is compiled from online sources; copyright belongs to the original author.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.