How to Scrape and Batch‑Download High‑Resolution 王者荣耀 Images with Python
This tutorial explains how to use Python's requests and lxml libraries to crawl a public image site, parse pagination URLs, retrieve high‑resolution 王者荣耀 pictures, and save them locally in batches, while highlighting key implementation steps and best‑practice cautions.
Project Background
王者荣耀 is a popular game, but its official site restricts direct download of high‑resolution character images due to copyright. This guide demonstrates how to scrape such images from the third‑party site www.netbian.com.
Project Goal
Automatically download the collected images in bulk.
Libraries and Target Site
Target URL: http://www.netbian.com/s/wangzherongyao/index.htm Key Python libraries: requests and lxml .
Project Analysis
Pagination URLs follow the pattern http://www.netbian.com/s/wangzherongyao/index_{}.htm, where the page number replaces {}. By iterating over page numbers, multiple pages can be fetched.
Implementation
1. Class Definition
import requests
from lxml import etree
import time
class ImageSpider(object):
def __init__(self):
self.firsr_url = "http://www.netbian.com/s/wangzherongyao/index.htm"
self.url = "http://www.netbian.com/s/wangzherongyao/index_{}.htm"
self.headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
}
def main(self):
pass
if __name__ == '__main__':
spider = ImageSpider()
spider.main()2. Sending Requests
'''Send request and get response'''
def get_page(self, url):
res = requests.get(url=url, headers=self.headers)
html = res.content.decode("gbk") # website encoding
return html3. Parsing Page Data
'''Parse data'''
def parse_page(self, html):
parse_html = etree.HTML(html)
image_src_list = parse_html.xpath('//div[@class="list"]/ul/li/a//@href')
for image_src in image_src_list:
fa = "http://www.netbian.com" + image_src
# process each thumbnail page4. Accessing Detail Pages
Open developer tools (F12), locate the link to the second‑level image page, and request it.
5. Extracting High‑Resolution URLs
bimg_url = parse_html1.xpath('//div[@class="pic-down"]/a/@href')
for i in bimg_url:
diet = "http://www.netbian.com" + i
html2 = self.get_page(diet)
parse_html2 = etree.HTML(html2)
url2 = parse_html2.xpath('//table[@id="endimg"]//tr//td//a/img/@src')6. Saving Images
filename = parse_html2.xpath('//table[@id="endimg"]//tr//td//a/@title')
for e in filename:
dirname = "./王者荣耀/" + e + '.jpg'
img_data = requests.get(url=r, headers=self.headers).content
with open(dirname, 'wb') as f:
f.write(img_data)
print("%s 下载成功" % e)7. Main Loop
def main(self):
startPage = int(input("起始页:"))
endPage = int(input("终止页:"))
for page in range(startPage, endPage + 1):
if page == 1:
url = self.firsr_url
else:
url = self.url.format(page)
html = self.get_page(url)
print("第%s页爬取成功!!!!" % page)
self.parse_page(html)Result Demonstration
Running the script prompts for the page range, then displays download success messages in the console. Downloaded images are saved locally, as shown in the screenshots.
Conclusion
• Avoid excessive crawling to reduce server load. • This Python web‑scraper provides a practical way to obtain high‑resolution 王者荣耀 images. • Users can modify the hero selection to set their own desktop wallpapers. • The full source code can be requested by replying with “王者荣耀”.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
