Using requests‑cache to Cache HTTP Requests in Python Web Scraping

This article introduces the requests‑cache library, explains how to install it, demonstrates basic and advanced usage—including session patching, backend selection, expiration policies, request/response filtering, and cache‑control header handling—to efficiently avoid duplicate HTTP requests during Python web scraping.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Using requests‑cache to Cache HTTP Requests in Python Web Scraping

When building web scrapers, repeated requests and lack of state persistence can cause unnecessary network traffic and long runtimes; caching previously fetched responses is an effective solution.

The requests‑cache package extends the popular requests library, providing transparent HTTP response caching with minimal code changes.

Installation

Install via pip:

pip3 install requests-cache

Basic Usage

Without caching, ten requests to http://httpbin.org/delay/1 take about 13 seconds:

import requests
import time

start = time.time()
session = requests.Session()
for i in range(10):
    session.get('http://httpbin.org/delay/1')
    print(f'Finished {i+1} requests')
end = time.time()
print('Cost time', end - start)

Using requests‑cache with a CachedSession reduces the total time to roughly 1.6 seconds because the first request is cached and the rest are served instantly:

import requests_cache
import time

start = time.time()
session = requests_cache.CachedSession('demo_cache')
for i in range(10):
    session.get('http://httpbin.org/delay/1')
    print(f'Finished {i+1} requests')
end = time.time()
print('Cost time', end - start)

Patch‑Style Configuration

Alternatively, call install_cache once and keep using the regular requests.Session:

import time
import requests
import requests_cache

requests_cache.install_cache('demo_cache')

start = time.time()
session = requests.Session()
for i in range(10):
    session.get('http://httpbin.org/delay/1')
    print(f'Finished {i+1} requests')
end = time.time()
print('Cost time', end - start)

Backend Configuration

The default backend is SQLite, but you can switch to filesystem, Redis, MongoDB, etc. Example using a filesystem backend:

import time
import requests
import requests_cache

requests_cache.install_cache('demo_cache', backend='filesystem')

start = time.time()
session = requests.Session()
for i in range(10):
    session.get('http://httpbin.org/delay/1')
    print(f'Finished {i+1} requests')
end = time.time()
print('Cost time', end - start)

Other backends such as Redis can be configured as:

backend = requests_cache.RedisCache(host='localhost', port=6379)
requests_cache.install_cache('demo_cache', backend=backend)

Filtering Requests

You can control which requests are cached. To cache only POST requests:

requests_cache.install_cache('demo_cache2', allowable_methods=['POST'])

# GET requests will not be cached, POST requests will.

Similarly, you can filter by response status codes or URL patterns using allowable_codes and urls_expire_after arguments.

Cache‑Control Headers

When cache_control=True is set, the library respects HTTP Cache‑Control headers. Adding Cache‑Control: no-store to a request disables caching for that call:

requests_cache.install_cache('demo_cache3')

session = requests.Session()
session.get('http://httpbin.org/delay/1', headers={'Cache-Control': 'no-store'})

Summary

The requests‑cache library offers a simple yet powerful way to cache HTTP responses in Python, supporting various backends, expiration policies, request/response filtering, and header‑based control, which together can dramatically speed up web‑scraping tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythoncachingHTTPWeb Scrapingrequests-cache
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.