Yes, the Python requests
library allows you to manually handle redirects by disabling automatic redirection. This gives you complete control over the redirect process.
Disabling Automatic Redirects
Set allow_redirects=False
to prevent requests from automatically following redirects:
import requests
# Disable automatic redirect following
response = requests.get('http://httpbin.org/redirect/1', allow_redirects=False)
print(f"Status Code: {response.status_code}") # 302
print(f"Location: {response.headers.get('Location')}") # Redirect URL
Basic Manual Redirect Handling
Check for redirects and follow them manually:
import requests
response = requests.get('http://httpbin.org/redirect/3', allow_redirects=False)
# Check for redirect status codes (300-399)
if response.is_redirect or response.is_permanent_redirect:
redirect_url = response.headers.get('Location')
print(f"Redirecting to: {redirect_url}")
# Follow the redirect manually
response = requests.get(redirect_url)
print(f"Final status: {response.status_code}")
Complete Redirect Chain Following
Handle multiple redirects with proper loop protection:
import requests
from urllib.parse import urljoin
def follow_redirects_manually(url, max_redirects=10):
"""
Follow redirects manually with loop protection
"""
redirects_followed = 0
redirect_chain = []
while redirects_followed < max_redirects:
response = requests.get(url, allow_redirects=False)
redirect_chain.append(url)
# Check if response is a redirect
if not (response.is_redirect or response.is_permanent_redirect):
# Not a redirect, we're done
break
# Get redirect location
location = response.headers.get('Location')
if not location:
break
# Handle relative URLs
url = urljoin(url, location)
redirects_followed += 1
print(f"Redirect {redirects_followed}: {response.status_code} -> {url}")
if redirects_followed >= max_redirects:
print(f"Stopped after {max_redirects} redirects")
return response, redirect_chain
# Usage example
final_response, chain = follow_redirects_manually('http://httpbin.org/redirect/5')
print(f"Final URL: {final_response.url}")
print(f"Redirect chain: {' -> '.join(chain)}")
Advanced Use Cases
Conditional Redirect Following
import requests
def selective_redirect_handler(url):
"""
Follow redirects only to trusted domains
"""
trusted_domains = ['example.com', 'httpbin.org']
response = requests.get(url, allow_redirects=False)
while response.is_redirect or response.is_permanent_redirect:
redirect_url = response.headers.get('Location')
# Check if redirect is to a trusted domain
from urllib.parse import urlparse
domain = urlparse(redirect_url).netloc
if domain not in trusted_domains:
print(f"Refusing to redirect to untrusted domain: {domain}")
break
print(f"Following redirect to trusted domain: {redirect_url}")
response = requests.get(redirect_url, allow_redirects=False)
return response
Preserving Headers Across Redirects
import requests
def redirect_with_custom_headers(url, headers=None):
"""
Manually follow redirects while preserving custom headers
"""
if not headers:
headers = {'User-Agent': 'My Custom Bot 1.0'}
response = requests.get(url, headers=headers, allow_redirects=False)
while response.is_redirect or response.is_permanent_redirect:
redirect_url = response.headers.get('Location')
print(f"Following redirect with custom headers: {redirect_url}")
# Custom headers are preserved across redirects
response = requests.get(redirect_url, headers=headers, allow_redirects=False)
return response
HTTP Status Code Reference
Common redirect status codes you'll encounter:
- 301: Moved Permanently
- 302: Found (temporary redirect)
- 303: See Other
- 307: Temporary Redirect (method preserved)
- 308: Permanent Redirect (method preserved)
import requests
response = requests.get('http://httpbin.org/status/301', allow_redirects=False)
# Check specific redirect types
if response.status_code == 301:
print("Permanent redirect")
elif response.status_code in [302, 303, 307]:
print("Temporary redirect")
Best Practices
- Always set a maximum redirect limit to prevent infinite loops
- Handle relative URLs using
urllib.parse.urljoin()
- Validate redirect destinations before following them
- Preserve necessary headers when making redirect requests
- Log redirect chains for debugging purposes
Manual redirect handling is particularly useful for web scraping when you need to: - Track all URLs in a redirect chain - Apply different logic based on redirect destinations - Preserve authentication or custom headers across redirects - Implement custom redirect policies