Backend Development 5 min read

Python Web Scraping Tutorial: Downloading Images from 3gbizhi.com

This tutorial demonstrates how to use Python's requests and lxml libraries to scrape static web pages, extract image URLs via XPath, convert thumbnail links to full‑size URLs, and download the images to a local folder, providing a complete end‑to‑end example.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Python Web Scraping Tutorial: Downloading Images from 3gbizhi.com

This article introduces a Python script for crawling and downloading high‑quality desktop wallpapers from the website 3gbizhi.com . It explains the required tools (PyCharm, Python 3.7 on Windows 10, and the third‑party packages requests and lxml ) and the overall workflow.

Target URL : The crawler starts from the base page https://www.3gbizhi.com and iterates through paginated sections such as https://www.3gbizhi.com/meinv/xgmn_{i}.html .

Project Idea : First determine whether the data is static or dynamic (by searching the page source for the needed keywords). The site uses static HTML, so the script sends HTTP GET requests with appropriate headers, parses the response with lxml.etree , and uses XPath expressions to locate detail page links and image URLs.

The process consists of three main steps:

Fetch the list page, extract each detail page URL and its title.

Visit each detail page, extract the src attribute of every <img> element inside the image list container.

Convert thumbnail URLs (containing thumb_200_0_ ) to full‑size URLs, download the binary content, and save each image as {title}{index}.jpg under a local 妹子/ directory.

The core code is shown below (kept unchanged and wrapped in a tag):</p><code>import requests from lxml import etree headers = { 'Cookie': 'Hm_lvt_c8263f264e5db13b29b03baeb1840f60=1632291839,1632373348; Hm_lpvt_c8263f264e5db13b29b03baeb1840f60=1632373697', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36' } for i in range(2, 3): url = f'https://www.3gbizhi.com/meinv/xgmn_{i}.html' response = requests.get(url, headers=headers) html = etree.HTML(response.text) href_list = html.xpath('//div[@class="contlistw mtw"]//ul[@class="cl"]/li/a/@href') title_list = html.xpath('//div[@class="contlistw mtw"]//ul[@class="cl"]/li/a/@title') for href, title in zip(href_list, title_list): res = requests.get(href, headers=headers) html_data = etree.HTML(res.text) img_url_list = html_data.xpath('//div[@class="picimglist pos"]/ul/li/a/img/@src') print(img_url_list) num = 0 for img_url in img_url_list: img_url = ''.join(img_url.split('thumb_200_0_')) result = requests.get(img_url, headers=headers).content with open('妹子/' + title + str(num) + '.jpg', 'wb') as f: f.write(result) num += 1 print(f'正在下载{title}第{num}张!!!!') The script prints each list of image URLs, downloads them sequentially, and logs progress. By following this example, readers can adapt the approach to other static sites that host image galleries.

web scrapingRequestsImage Downloadlxml
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.