How to Batch Download Images with Python: From XPath Extraction to Automated Saving
This tutorial walks you through extracting image URLs from a webpage using XPath, constructing full URLs, and automating batch downloads with Python's requests, lxml, and fake_useragent libraries, including code snippets and practical tips for handling files and headers.
1. Introduction
The previous article covered the theory of image crawling; this guide completes the process by showing how to batch‑download images from a website using Python.
2. Image URL Parsing
1. Open the target page, right‑click an image and inspect its element to locate the src attribute.
2. Isolate the src value and note its parent <ul> (or higher) tag.
3. Use an XPath selector to extract the src value, prepend the appropriate https:// prefix, and obtain the full image URL.
4. Run the XPath query in Python to retrieve the URLs, as illustrated in the screenshots.
3. Downloading Images
1. Create a filename (or folder) to store the downloaded images, e.g., a directory named 天堂爬的图片.
2. Ensure the folder exists in the same directory as your script; otherwise, the script will raise an error.
3. Use a with open(filename, "wb") as f block to write binary image data to disk. Example code:
with open(filename, "wb") as f:
f.write(html)4. The snippet explains the meaning of the file mode "wb" (write binary) and the variable f.
5. Install and import fake_useragent to generate random request headers, reducing the chance of being blocked:
from fake_useragent import UserAgent
ua = UserAgent()
print(ua.ie) # random IE version
print(ua.firefox) # random Firefox version
print(ua.chrome) # random Chrome version
print(ua.random) # random user‑agent string6. Initialize the UserAgent object and select a random header (e.g., ua.random) for each request, optionally choosing from a pool of 50 agents.
7. Execute the download loop; the terminal output shows successful retrieval of each image.
8. After the script finishes, all images are saved locally in high resolution.
4. Conclusion
The article demonstrates a practical workflow using Python's requests , lxml , and fake_useragent libraries to parse webpage structures, extract image URLs, and perform batch downloads efficiently. Readers are encouraged to try the method while respecting server load.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
