Master Python File Downloads: Requests, Wget, urllib, Async & More
This tutorial walks through multiple Python approaches for downloading files—including simple requests and wget calls, handling redirects, large and multi‑file downloads, proxy usage, urllib/urllib3 methods, and asynchronous techniques—providing complete code snippets and practical tips for each scenario.
Using Requests
Download a file by calling requests.get(url) and writing myfile.content to a local file.
import requests
url = 'https://www.python.org/static/img/[email protected]'
myfile = requests.get(url)
open('c:/users/21cto/downloads/PythonImage.png', 'wb').write(myfile.content)Using wget
Install the wget module with pip install wget and download a file via wget.download(url, path).
import wget
url = "https://www.python.org/static/img/[email protected]"
wget.download(url, 'c:/users/LikeGeeks/downloads/pythonLogo.png')Downloading Redirected Files
Enable redirects with allow_redirects=True when using requests.get, then write the content to a file.
import requests
url = 'https://readthedocs.org/projects/python-guide/downloads/pdf/latest/'
myfile = requests.get(url, allow_redirects=True)
open('c:/users/21cto/documents/PythonBook.pdf', 'wb').write(myfile.content)Downloading Large Files
Stream the response and write it in chunks to avoid loading the entire file into memory.
import requests
url = 'https://www.python.org/static/img/[email protected]'
myfile = requests.get(url, stream=True)
with open('c:/users/21cto/downloads/PythonImage.png', 'wb') as f:
for chunk in myfile.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)Parallel / Batch Downloads
Use ThreadPool from multiprocessing.pool to download multiple URLs concurrently.
import os, requests, time
from multiprocessing.pool import ThreadPool
def url_response(item):
path, url = item
r = requests.get(url, stream=True)
with open(path, 'wb') as f:
for chunk in r:
f.write(chunk)
urls = [
("c:/users/21cto/file1.pdf", "https://example.com/file1.pdf"),
("c:/users/21cto/file2.pdf", "https://example.com/file2.pdf"),
# ... more URLs ...
]
start = time.time()
ThreadPool(9).imap_unordered(url_response, urls)
print(f"Time to download: {time.time() - start}")Using urllib
The standard library urllib.request.urlretrieve can download a URL directly to a local path.
import urllib.request
urllib.request.urlretrieve('https://www.python.org/', 'c:/users/21cto/documents/PythonOrganization.html')Downloading via Proxy
Create a ProxyHandler and build an opener to route requests through a proxy server.
import urllib.request
myProxy = urllib.request.ProxyHandler({'http': '127.0.0.2'})
openProxy = urllib.request.build_opener(myProxy)
urllib.request.urlretrieve('https://www.python.org/', 'c:/users/21cto/documents/PythonOrg.html')
# Using requests with proxies
myProxy = {'http': 'http://127.0.0.2:3001'}
requests.get('https://www.python.org/', proxies=myProxy)Using urllib3
Install urllib3 and use its PoolManager to fetch content, then write it with shutil.copyfileobj.
pip install urllib3
import urllib3, shutil
c = urllib3.PoolManager()
url = 'https://www.python.org/'
filename = 'mytest.txt'
with c.request('GET', url, preload_content=False) as res, open(filename, 'wb') as out_file:
shutil.copyfileobj(res, out_file)Asynchronous Downloads
Leverage asyncio to run multiple download coroutines concurrently.
import asyncio, urllib.request
async def coroutine(url):
r = urllib.request.urlopen(url)
filename = "coroutine_download.txt"
with open(filename, 'wb') as f:
for chunk in r:
f.write(chunk)
return 'Download succeeded'
async def main_func(urls):
tasks = [coroutine(u) for u in urls]
downloaded, _ = await asyncio.wait(tasks)
for d in downloaded:
print(d.result())
urls_to_download = [
"https://www.python.org/events/python-events/801/",
"https://www.python.org/events/python-events/790/",
"https://www.python.org/events/python-user-group/816/",
"https://www.python.org/events/python-events/757/"
]
loop = asyncio.get_event_loop()
loop.run_until_complete(main_func(urls_to_download))These examples demonstrate a wide range of Python techniques for downloading files, from simple synchronous calls to advanced asynchronous and parallel solutions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
