Yes, urllib3
can automatically handle decoding of content, but it requires a little bit of setup. By default, urllib3
does not automatically decode response content because it aims to provide a low-level interface for HTTP requests.
However, you can enable automatic decoding by using a Response
object's data
property, which decodes the content based on the Content-Encoding
header. The most common content encodings you'll encounter are gzip
and deflate
, which are used to compress responses for more efficient transfer over the network.
Here's how you can use urllib3
to automatically decode gzipped or deflated content:
import urllib3
from urllib3.response import HTTPResponse
# Create an instance of the PoolManager to handle connections
http = urllib3.PoolManager()
# Make a request to a URL that returns compressed content
response: HTTPResponse = http.request('GET', 'http://example.com/')
# Check if the response was compressed
content_encoding = response.headers.get('Content-Encoding', '').lower()
if content_encoding == 'gzip':
print('Response is gzip encoded.')
elif content_encoding == 'deflate':
print('Response is deflate encoded.')
# Access `data` property, which automatically decodes based on the Content-Encoding header
content = response.data
# Now `content` is a byte string that contains the decoded content of the response
print(content)
If you want to ensure that any encoding is handled and you want to use this functionality across all requests, you can subclass urllib3.PoolManager
and override the urlopen
method to automatically decode the content:
import urllib3
from urllib3.response import HTTPResponse
class DecodingPoolManager(urllib3.PoolManager):
def urlopen(self, method, url, **kwargs):
response: HTTPResponse = super().urlopen(method, url, **kwargs)
# If the response has a body (not a HEAD request or a 204/304 response),
# then read and decode the content if necessary
if response.data:
content = response.data
else:
content = b''
return content
# Use the custom PoolManager
http = DecodingPoolManager()
# Make a request as before
content = http.urlopen('GET', 'http://example.com/')
print(content)
With this subclass, you can just use the http.urlopen()
method and it will return the decoded content directly.
Please note that calling response.data
multiple times will not result in multiple reads from the server—the content is cached after the first read. If you want to access the raw, undecoded content, use response.read(cache_content=False)
instead.
Finally, keep in mind that you might need to install additional dependencies for urllib3
to handle compression. For example, you may need to install brotli
if you encounter br
content encoding:
pip install brotlipy
This installation step is necessary because urllib3
does not have built-in support for Brotli compression and relies on third-party libraries for this functionality.