Using requests‑cache to Cache HTTP Requests in Python Web Scraping
This article introduces the requests‑cache library, explains how to install it, demonstrates basic and advanced usage—including session patching, backend selection, expiration policies, request/response filtering, and cache‑control header handling—to efficiently avoid duplicate HTTP requests during Python web scraping.
When building web scrapers, repeated requests and lack of state persistence can cause unnecessary network traffic and long runtimes; caching previously fetched responses is an effective solution.
The requests‑cache package extends the popular requests library, providing transparent HTTP response caching with minimal code changes.
Installation
Install via pip:
pip3 install requests-cacheBasic Usage
Without caching, ten requests to http://httpbin.org/delay/1 take about 13 seconds:
import requests
import time
start = time.time()
session = requests.Session()
for i in range(10):
session.get('http://httpbin.org/delay/1')
print(f'Finished {i+1} requests')
end = time.time()
print('Cost time', end - start)Using requests‑cache with a CachedSession reduces the total time to roughly 1.6 seconds because the first request is cached and the rest are served instantly:
import requests_cache
import time
start = time.time()
session = requests_cache.CachedSession('demo_cache')
for i in range(10):
session.get('http://httpbin.org/delay/1')
print(f'Finished {i+1} requests')
end = time.time()
print('Cost time', end - start)Patch‑Style Configuration
Alternatively, call install_cache once and keep using the regular requests.Session:
import time
import requests
import requests_cache
requests_cache.install_cache('demo_cache')
start = time.time()
session = requests.Session()
for i in range(10):
session.get('http://httpbin.org/delay/1')
print(f'Finished {i+1} requests')
end = time.time()
print('Cost time', end - start)Backend Configuration
The default backend is SQLite, but you can switch to filesystem, Redis, MongoDB, etc. Example using a filesystem backend:
import time
import requests
import requests_cache
requests_cache.install_cache('demo_cache', backend='filesystem')
start = time.time()
session = requests.Session()
for i in range(10):
session.get('http://httpbin.org/delay/1')
print(f'Finished {i+1} requests')
end = time.time()
print('Cost time', end - start)Other backends such as Redis can be configured as:
backend = requests_cache.RedisCache(host='localhost', port=6379)
requests_cache.install_cache('demo_cache', backend=backend)Filtering Requests
You can control which requests are cached. To cache only POST requests:
requests_cache.install_cache('demo_cache2', allowable_methods=['POST'])
# GET requests will not be cached, POST requests will.Similarly, you can filter by response status codes or URL patterns using allowable_codes and urls_expire_after arguments.
Cache‑Control Headers
When cache_control=True is set, the library respects HTTP Cache‑Control headers. Adding Cache‑Control: no-store to a request disables caching for that call:
requests_cache.install_cache('demo_cache3')
session = requests.Session()
session.get('http://httpbin.org/delay/1', headers={'Cache-Control': 'no-store'})Summary
The requests‑cache library offers a simple yet powerful way to cache HTTP responses in Python, supporting various backends, expiration policies, request/response filtering, and header‑based control, which together can dramatically speed up web‑scraping tasks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
