How do I handle HTTP status codes effectively with urllib3?
HTTP status codes are essential indicators of how your web requests are being processed by servers. When using urllib3 for web scraping or API interactions, proper status code handling ensures your applications are robust, reliable, and can gracefully handle various server responses. This guide covers comprehensive strategies for handling HTTP status codes with urllib3.
Understanding HTTP Status Codes
HTTP status codes are three-digit numbers that indicate the outcome of HTTP requests. They're grouped into five categories:
- 1xx (Informational): Request received, continuing process
- 2xx (Success): Request was successfully received, understood, and accepted
- 3xx (Redirection): Further action needs to be taken to complete the request
- 4xx (Client Error): Request contains bad syntax or cannot be fulfilled
- 5xx (Server Error): Server failed to fulfill an apparently valid request
Basic Status Code Handling with urllib3
Here's how to check and handle HTTP status codes with urllib3:
import urllib3
from urllib3.exceptions import HTTPError
# Create a PoolManager instance
http = urllib3.PoolManager()
def make_request_with_status_handling(url):
try:
response = http.request('GET', url)
# Check status code
if response.status == 200:
print("Success! Data retrieved successfully")
return response.data.decode('utf-8')
elif response.status == 404:
print("Error: Resource not found")
return None
elif response.status == 403:
print("Error: Access forbidden")
return None
elif response.status == 500:
print("Error: Internal server error")
return None
else:
print(f"Unexpected status code: {response.status}")
return None
except HTTPError as e:
print(f"HTTP error occurred: {e}")
return None
except Exception as e:
print(f"An error occurred: {e}")
return None
# Usage example
url = "https://httpbin.org/status/200"
result = make_request_with_status_handling(url)
Comprehensive Status Code Handling Strategy
For production applications, implement a more comprehensive approach:
import urllib3
import time
from urllib3.exceptions import HTTPError, TimeoutError, MaxRetryError
class StatusCodeHandler:
def __init__(self, max_retries=3, backoff_factor=1):
self.http = urllib3.PoolManager(
retries=urllib3.Retry(
total=max_retries,
backoff_factor=backoff_factor,
status_forcelist=[429, 500, 502, 503, 504]
)
)
def handle_response(self, response):
"""Handle different HTTP status codes"""
status_handlers = {
200: self._handle_success,
201: self._handle_created,
204: self._handle_no_content,
301: self._handle_redirect,
302: self._handle_redirect,
400: self._handle_bad_request,
401: self._handle_unauthorized,
403: self._handle_forbidden,
404: self._handle_not_found,
429: self._handle_rate_limit,
500: self._handle_server_error,
502: self._handle_bad_gateway,
503: self._handle_service_unavailable,
}
handler = status_handlers.get(response.status, self._handle_unknown)
return handler(response)
def _handle_success(self, response):
return {
'success': True,
'data': response.data.decode('utf-8'),
'status': response.status
}
def _handle_created(self, response):
return {
'success': True,
'message': 'Resource created successfully',
'status': response.status
}
def _handle_no_content(self, response):
return {
'success': True,
'message': 'Operation completed successfully',
'status': response.status
}
def _handle_redirect(self, response):
return {
'success': False,
'error': 'Redirect not followed automatically',
'location': response.headers.get('Location'),
'status': response.status
}
def _handle_bad_request(self, response):
return {
'success': False,
'error': 'Bad request - check your parameters',
'status': response.status
}
def _handle_unauthorized(self, response):
return {
'success': False,
'error': 'Authentication required',
'status': response.status
}
def _handle_forbidden(self, response):
return {
'success': False,
'error': 'Access forbidden - insufficient permissions',
'status': response.status
}
def _handle_not_found(self, response):
return {
'success': False,
'error': 'Resource not found',
'status': response.status
}
def _handle_rate_limit(self, response):
retry_after = response.headers.get('Retry-After', 60)
return {
'success': False,
'error': f'Rate limited - retry after {retry_after} seconds',
'retry_after': int(retry_after),
'status': response.status
}
def _handle_server_error(self, response):
return {
'success': False,
'error': 'Internal server error',
'status': response.status
}
def _handle_bad_gateway(self, response):
return {
'success': False,
'error': 'Bad gateway - server acting as proxy received invalid response',
'status': response.status
}
def _handle_service_unavailable(self, response):
return {
'success': False,
'error': 'Service temporarily unavailable',
'status': response.status
}
def _handle_unknown(self, response):
return {
'success': False,
'error': f'Unknown status code: {response.status}',
'status': response.status
}
# Usage example
handler = StatusCodeHandler()
def make_robust_request(url, method='GET', **kwargs):
try:
response = handler.http.request(method, url, **kwargs)
return handler.handle_response(response)
except MaxRetryError as e:
return {
'success': False,
'error': f'Max retries exceeded: {e}',
'status': None
}
except TimeoutError as e:
return {
'success': False,
'error': f'Request timeout: {e}',
'status': None
}
except Exception as e:
return {
'success': False,
'error': f'Unexpected error: {e}',
'status': None
}
# Test the implementation
result = make_robust_request('https://httpbin.org/status/404')
print(result)
Implementing Retry Logic for Specific Status Codes
urllib3 provides built-in retry functionality that can be customized for specific status codes:
import urllib3
from urllib3.util.retry import Retry
# Configure retry strategy
retry_strategy = Retry(
total=5, # Total number of retries
status_forcelist=[429, 500, 502, 503, 504], # Status codes to retry
method_whitelist=["HEAD", "GET", "OPTIONS"], # HTTP methods to retry
backoff_factor=1, # Backoff factor for retry delays
raise_on_status=False # Don't raise exceptions on bad status codes
)
# Create PoolManager with retry configuration
http = urllib3.PoolManager(retries=retry_strategy)
def make_request_with_retries(url):
try:
response = http.request('GET', url)
if response.status in [200, 201, 204]:
return {
'success': True,
'data': response.data.decode('utf-8'),
'status': response.status
}
else:
return {
'success': False,
'error': f'Request failed with status {response.status}',
'status': response.status
}
except urllib3.exceptions.MaxRetryError as e:
return {
'success': False,
'error': f'Max retries exceeded: {e}',
'status': None
}
# Usage example
result = make_request_with_retries('https://httpbin.org/status/503')
print(result)
Advanced Status Code Handling with Context Managers
For better resource management and consistent error handling:
import urllib3
from contextlib import contextmanager
import logging
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@contextmanager
def http_client(timeout=30, retries=3):
"""Context manager for urllib3 HTTP client with proper cleanup"""
retry_config = urllib3.Retry(
total=retries,
status_forcelist=[429, 500, 502, 503, 504],
backoff_factor=0.3
)
http = urllib3.PoolManager(
timeout=urllib3.Timeout(connect=timeout, read=timeout),
retries=retry_config
)
try:
yield http
finally:
http.clear()
class HTTPStatusHandler:
@staticmethod
def is_success(status_code):
"""Check if status code indicates success"""
return 200 <= status_code < 300
@staticmethod
def is_client_error(status_code):
"""Check if status code indicates client error"""
return 400 <= status_code < 500
@staticmethod
def is_server_error(status_code):
"""Check if status code indicates server error"""
return 500 <= status_code < 600
@staticmethod
def should_retry(status_code):
"""Determine if request should be retried based on status code"""
return status_code in [429, 500, 502, 503, 504]
@staticmethod
def log_status(url, status_code, method='GET'):
"""Log HTTP status information"""
if HTTPStatusHandler.is_success(status_code):
logger.info(f"{method} {url} - Success ({status_code})")
elif HTTPStatusHandler.is_client_error(status_code):
logger.warning(f"{method} {url} - Client Error ({status_code})")
elif HTTPStatusHandler.is_server_error(status_code):
logger.error(f"{method} {url} - Server Error ({status_code})")
def fetch_with_comprehensive_handling(url, method='GET', **kwargs):
"""Fetch URL with comprehensive status code handling"""
with http_client() as http:
try:
response = http.request(method, url, **kwargs)
# Log the status
HTTPStatusHandler.log_status(url, response.status, method)
# Handle based on status code category
if HTTPStatusHandler.is_success(response.status):
return {
'success': True,
'data': response.data.decode('utf-8'),
'status': response.status,
'headers': dict(response.headers)
}
elif HTTPStatusHandler.is_client_error(response.status):
error_messages = {
400: "Bad Request - Invalid parameters",
401: "Unauthorized - Authentication required",
403: "Forbidden - Access denied",
404: "Not Found - Resource doesn't exist",
409: "Conflict - Resource conflict",
422: "Unprocessable Entity - Validation failed",
429: "Too Many Requests - Rate limited"
}
error_msg = error_messages.get(
response.status,
f"Client error ({response.status})"
)
return {
'success': False,
'error': error_msg,
'status': response.status,
'retry_recommended': response.status == 429
}
elif HTTPStatusHandler.is_server_error(response.status):
return {
'success': False,
'error': f"Server error ({response.status})",
'status': response.status,
'retry_recommended': HTTPStatusHandler.should_retry(response.status)
}
else:
return {
'success': False,
'error': f"Unexpected status code: {response.status}",
'status': response.status,
'retry_recommended': False
}
except Exception as e:
logger.error(f"Request failed: {e}")
return {
'success': False,
'error': str(e),
'status': None,
'retry_recommended': True
}
# Usage example
result = fetch_with_comprehensive_handling('https://httpbin.org/status/200')
print(result)
Best Practices for HTTP Status Code Handling
1. Always Check Status Codes
Never assume a request succeeded without checking the status code:
response = http.request('GET', url)
if response.status != 200:
# Handle non-200 responses appropriately
handle_error(response.status, response.data)
2. Implement Proper Logging
Log different status codes at appropriate levels:
import logging
def log_response(response, url):
if 200 <= response.status < 300:
logging.info(f"Success: {url} returned {response.status}")
elif 400 <= response.status < 500:
logging.warning(f"Client error: {url} returned {response.status}")
elif 500 <= response.status < 600:
logging.error(f"Server error: {url} returned {response.status}")
3. Handle Rate Limiting Gracefully
Respect rate limits and implement exponential backoff:
def handle_rate_limit(response):
if response.status == 429:
retry_after = int(response.headers.get('Retry-After', 60))
print(f"Rate limited. Waiting {retry_after} seconds...")
time.sleep(retry_after)
return True
return False
4. Distinguish Between Retryable and Non-Retryable Errors
Not all errors should trigger retries:
RETRYABLE_STATUS_CODES = [429, 500, 502, 503, 504]
NON_RETRYABLE_STATUS_CODES = [400, 401, 403, 404, 422]
def should_retry_request(status_code):
return status_code in RETRYABLE_STATUS_CODES
Integration with Web Scraping Workflows
When building web scrapers, proper HTTP status code handling is crucial. While urllib3 provides low-level HTTP handling, you might also consider browser automation tools for JavaScript-heavy sites. For complex scenarios involving dynamic content, handling authentication flows or managing timeouts effectively might require additional tools beyond urllib3.
Conclusion
Effective HTTP status code handling with urllib3 involves understanding the meaning of different status codes, implementing appropriate retry logic, and building robust error handling mechanisms. By following the patterns shown in this guide, you can create reliable applications that gracefully handle various server responses and network conditions.
Remember to always log relevant information, respect rate limits, and implement proper retry strategies for transient errors. This approach ensures your applications are resilient and provide meaningful feedback when issues occur.