Yes, urllib3
provides automatic content decoding through its Response
object's data
property. While urllib3 is designed as a low-level HTTP library, it includes built-in support for common content encodings like gzip
, deflate
, and brotli
.
How Automatic Decoding Works
When you access the response.data
property, urllib3 automatically:
1. Checks the Content-Encoding
header
2. Applies the appropriate decompression algorithm
3. Returns the decoded content as bytes
Basic Usage Example
import urllib3
# Create a PoolManager instance
http = urllib3.PoolManager()
# Make a request - urllib3 automatically handles Accept-Encoding
response = http.request('GET', 'https://httpbin.org/gzip')
# The data property automatically decodes compressed content
decoded_content = response.data
# Convert bytes to string if needed
text_content = decoded_content.decode('utf-8')
print(text_content)
Checking Content Encoding
You can verify which encoding was used by inspecting the headers:
import urllib3
http = urllib3.PoolManager()
response = http.request('GET', 'https://httpbin.org/gzip')
# Check the content encoding
encoding = response.headers.get('Content-Encoding', 'none')
print(f"Content encoding: {encoding}")
# Access decoded content
content = response.data
print(f"Decoded content length: {len(content)} bytes")
Handling Different Encodings
urllib3 supports multiple compression formats:
import urllib3
def test_encoding(url, expected_encoding):
http = urllib3.PoolManager()
response = http.request('GET', url)
actual_encoding = response.headers.get('Content-Encoding', 'none')
print(f"URL: {url}")
print(f"Expected: {expected_encoding}, Actual: {actual_encoding}")
print(f"Content length: {len(response.data)} bytes")
print("---")
# Test different encodings
test_encoding('https://httpbin.org/gzip', 'gzip')
test_encoding('https://httpbin.org/deflate', 'deflate')
test_encoding('https://httpbin.org/brotli', 'br')
Raw vs Decoded Content
You can access both raw and decoded content:
import urllib3
http = urllib3.PoolManager()
response = http.request('GET', 'https://httpbin.org/gzip')
# Get raw compressed content
raw_content = response.read(cache_content=False)
print(f"Raw content length: {len(raw_content)} bytes")
# Get automatically decoded content
decoded_content = response.data
print(f"Decoded content length: {len(decoded_content)} bytes")
Disabling Automatic Decoding
If you need the raw compressed content, you can disable automatic decoding:
import urllib3
# Disable automatic decompression
http = urllib3.PoolManager()
response = http.request(
'GET',
'https://httpbin.org/gzip',
headers={'Accept-Encoding': 'identity'} # Request no compression
)
# Or read raw content without decoding
response = http.request('GET', 'https://httpbin.org/gzip')
raw_content = response.read(decode_content=False)
Custom Decoding Pool Manager
For advanced use cases, create a custom PoolManager with automatic text decoding:
import urllib3
from urllib3.response import HTTPResponse
class AutoDecodingPoolManager(urllib3.PoolManager):
def request(self, method, url, **kwargs):
response = super().request(method, url, **kwargs)
# Automatically decode to text if content-type suggests it
content_type = response.headers.get('Content-Type', '').lower()
if any(ct in content_type for ct in ['text/', 'application/json', 'application/xml']):
# Get charset from content-type or default to utf-8
charset = 'utf-8'
if 'charset=' in content_type:
charset = content_type.split('charset=')[1].split(';')[0]
# Decode bytes to string
try:
response._decoded_text = response.data.decode(charset)
except UnicodeDecodeError:
response._decoded_text = response.data.decode('utf-8', errors='replace')
return response
# Usage
http = AutoDecodingPoolManager()
response = http.request('GET', 'https://httpbin.org/json')
print(response._decoded_text) # Already decoded to string
Brotli Support
For Brotli compression support, install the brotli library:
# Install brotli support
pip install brotli
# Or install brotlipy as an alternative
pip install brotlipy
Then urllib3 will automatically handle brotli-compressed content:
import urllib3
http = urllib3.PoolManager()
response = http.request('GET', 'https://httpbin.org/brotli')
# Automatically decompressed if brotli is installed
content = response.data
print(f"Brotli content decoded: {len(content)} bytes")
Error Handling
Handle decompression errors gracefully:
import urllib3
from urllib3.exceptions import DecodeError
http = urllib3.PoolManager()
try:
response = http.request('GET', 'https://example.com')
content = response.data
print("Content decoded successfully")
except DecodeError as e:
print(f"Decompression failed: {e}")
# Fall back to raw content
raw_content = response.read(decode_content=False)
print(f"Raw content length: {len(raw_content)} bytes")
Key Points
- Automatic: The
response.data
property handles decompression automatically - Caching: Content is cached after the first read - multiple calls to
response.data
don't re-read from the server - Encoding Support: Built-in support for gzip and deflate; brotli requires additional installation
- Flexibility: You can access both raw and decoded content as needed
- Performance: Automatic decompression improves bandwidth efficiency without additional complexity