To cache responses with the requests
library in Python, you'll need to use an additional caching library that works alongside requests
. One popular choice is requests-cache
, which provides a simple and effective way to cache responses automatically with minimal setup.
Here's how you can use requests-cache
:
- Install
requests-cache
usingpip
:
pip install requests-cache
- Use
requests-cache
in your code to enable caching for yourrequests
calls:
import requests
import requests_cache
# Install the cache; 'cache_name' is the name of the SQLite database file
requests_cache.install_cache('cache_name', expire_after=180) # Cache for 180 seconds
# Make a request
response = requests.get('https://example.com')
# Check if this request was fetched from the cache
print(response.from_cache)
In the code above, expire_after
parameter defines the number of seconds you want to keep the response in the cache. You can also set it to None
for an infinite cache or use a timedelta
for more complex time intervals.
Here are some additional options you can use with requests-cache
:
- Backends: By default,
requests-cache
uses a SQLite backend, but you can also use others like Redis, MongoDB, or a simple in-memory store. - Session: Instead of installing the cache globally, you can create a custom session:
session = requests_cache.CachedSession('cache_name', expire_after=180)
response = session.get('https://example.com')
- Conditions: You can customize which requests are cached and which are not based on conditions:
def my_custom_cache_condition(request):
# Only cache requests to a specific domain
return 'example.com' in request.url
session = requests_cache.CachedSession('cache_name', expire_after=180)
session.cache.allowable_codes = (200,)
session.cache.allowable_methods = ('GET', 'POST')
session.cache.allowable_conditions = my_custom_cache_condition
response = session.get('https://example.com')
- Clearing the cache: You can clear the cache for specific URLs or the entire cache:
# Clear the cache for a specific URL
requests_cache.clear('https://example.com')
# Clear the entire cache
requests_cache.clear()
- Serialization: You can choose the serialization method for your cache (like JSON or BSON) by specifying the
serializer
parameter when creating aCachedSession
.
Remember to always respect the terms of service of the websites you're scraping and the legal implications of web scraping and caching.
For more advanced caching strategies, you might consider implementing your own caching logic using Python's built-in libraries like pickle
for serialization and shelve
for a simple persistent storage. However, for most use cases, requests-cache
should suffice.