Backend Development 5 min read

Python Web Scraping Tutorial: Downloading Images from 3gbizhi.com

This tutorial demonstrates how to use Python's requests and lxml libraries to scrape static web pages, extract image URLs via XPath, convert thumbnail links to full‑size URLs, and download the images to a local folder, providing a complete end‑to‑end example.

Python Programming Learning Circle

Sep 27, 2021

Python Web Scraping Tutorial: Downloading Images from 3gbizhi.com

This article introduces a Python script for crawling and downloading high‑quality desktop wallpapers from the website 3gbizhi.com . It explains the required tools (PyCharm, Python 3.7 on Windows 10, and the third‑party packages requests and lxml) and the overall workflow.

Target URL : The crawler starts from the base page https://www.3gbizhi.com and iterates through paginated sections such as https://www.3gbizhi.com/meinv/xgmn_{i}.html.

Project Idea : First determine whether the data is static or dynamic (by searching the page source for the needed keywords). The site uses static HTML, so the script sends HTTP GET requests with appropriate headers, parses the response with lxml.etree, and uses XPath expressions to locate detail page links and image URLs.

The process consists of three main steps:

Fetch the list page, extract each detail page URL and its title.

Visit each detail page, extract the src attribute of every <img> element inside the image list container.

Convert thumbnail URLs (containing thumb_200_0_) to full‑size URLs, download the binary content, and save each image as {title}{index}.jpg under a local 妹子/ directory.

The core code is shown below (kept unchanged and wrapped in a

tag):</p><code>import requests
from lxml import etree

headers = {
    'Cookie': 'Hm_lvt_c8263f264e5db13b29b03baeb1840f60=1632291839,1632373348; Hm_lpvt_c8263f264e5db13b29b03baeb1840f60=1632373697',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36'
}

for i in range(2, 3):
    url = f'https://www.3gbizhi.com/meinv/xgmn_{i}.html'
    response = requests.get(url, headers=headers)
    html = etree.HTML(response.text)
    href_list = html.xpath('//div[@class="contlistw mtw"]//ul[@class="cl"]/li/a/@href')
    title_list = html.xpath('//div[@class="contlistw mtw"]//ul[@class="cl"]/li/a/@title')
    for href, title in zip(href_list, title_list):
        res = requests.get(href, headers=headers)
        html_data = etree.HTML(res.text)
        img_url_list = html_data.xpath('//div[@class="picimglist pos"]/ul/li/a/img/@src')
        print(img_url_list)
        num = 0
        for img_url in img_url_list:
            img_url = ''.join(img_url.split('thumb_200_0_'))
            result = requests.get(img_url, headers=headers).content
            with open('妹子/' + title + str(num) + '.jpg', 'wb') as f:
                f.write(result)
            num += 1
            print(f'正在下载{title}第{num}张！！！！')

The script prints each list of image URLs, downloads them sequentially, and logs progress. By following this example, readers can adapt the approach to other static sites that host image galleries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Web Scraping requests Image Download lxml

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.