Backend Development 10 min read

Using requests_cache to Simulate Browser Caching in Python Web Scraping

This article explains how to use the Python requests_cache library to mimic browser caching behavior in web‑scraping tasks, reducing request latency by skipping delays when cached responses are available, and shows various configuration options, back‑ends, and practical code examples.

Python Programming Learning Circle

Mar 30, 2022

Using requests_cache to Simulate Browser Caching in Python Web Scraping

Typical anti‑scraping measures add random delays between requests, but if a response is cached the delay can be omitted, shortening the crawler's runtime. The requests_cache library can simulate browser‑like caching: it returns a cached response when available, otherwise it waits briefly before proceeding.

Basic usage

Install the library and enable caching: pip install requests-cache Example hook that adds a throttle when a response is not from cache:

import requests_cache
import time

requests_cache.install_cache()  # default browser‑style cache
requests_cache.clear()

def make_throttle_hook(timeout=0.1):
    def hook(response, *args, **kwargs):
        print(response.text)
        if not getattr(response, 'from_cache', False):
            print(f'Wait {timeout} s!')
            time.sleep(timeout)
        else:
            print(f'exists cache: {response.from_cache}')
        return response
    return hook

if __name__ == '__main__':
    requests_cache.install_cache()
    requests_cache.clear()
    session = requests_cache.CachedSession()  # create cached session
    session.hooks = {'response': make_throttle_hook(2)}  # attach hook
    print('first requests'.center(50, '*'))
    session.get('http://httpbin.org/get')
    print('second requests'.center(50, '*'))
    session.get('http://httpbin.org/get')

The article also lists useful testing sites such as httpbin.org, which provides endpoints for inspecting cookies, IP, headers, and authentication.

Why use requests_cache?

It follows browser caching rules, helping to evade simple anti‑scraping detection.

Customizable cache back‑ends improve performance for repeated requests.

Note that only requests made through a Session (e.g., session.get) can be cached; direct calls like requests.get are not cached.

Installation pip install requests-cache Comparing non‑cached and cached code

Non‑cached example:

import requests
import time

start = time.time()
session = requests.Session()
for i in range(10):
    session.get('http://httpbin.org/delay/1')
    print(f'Finished {i + 1} requests')
end = time.time()
print('Cost time', end - start)

Cached example using requests_cache.CachedSession:

import requests_cache  # pip install requests_cache
import time

start = time.time()
session = requests_cache.CachedSession('demo_cache')
for i in range(10):
    session.get('http://httpbin.org/delay/1')
    print(f'Finished {i + 1} requests')
end = time.time()
print('Cost time', end - start)

To add caching to existing code with minimal changes, simply call: requests_cache.install_cache('demo_cache') and continue using the normal requests API.

Cache management

Clear the cache with: requests_cache.clear() Check whether a response came from cache:

import requests_cache
import requests
requests_cache.install_cache()
url = 'http://httpbin.org/get'
res = requests.get(url)
print(f'cache exists: {res.from_cache}')
res = requests.get(url)
print(f'exists cache: {res.from_cache}')

Custom cache back‑ends

The backend parameter lets you store cache data in memory, sqlite, mongoDB, or redis. Example for a filesystem cache:

requests_cache.install_cache('demo_cache', backend='filesystem')
# or with temporary directory
requests_cache.install_cache('demo_cache', backend='filesystem', use_temp=True)

Redis example (requires redis-py):

backend = requests_cache.RedisCache(host='localhost', port=6379)
requests_cache.install_cache('demo_cache', backend=backend)

Advanced options expire_after: set global cache TTL (default is permanent). allowable_codes and allowable_methods: restrict caching to specific HTTP status codes or methods. cache_control=True: respect the response's Cache‑Control header.

Example of per‑URL expiration:

urls_expire_after = {'*.site1.com': 30, 'site2.com/static': -1}
requests_cache.install_cache('demo_cache2', urls_expire_after=urls_expire_after)

When Cache‑Control: no-store is sent in request headers, the cache is bypassed even if caching is enabled:

session.get('http://httpbin.org/delay/1', headers={'Cache-Control': 'no-store'})

Overall, requests_cache provides a flexible way to add HTTP response caching to Python scraping scripts, improving speed and reducing server load while offering fine‑grained control over cache behavior.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

caching requests-cache web-scraping

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.