How do I handle redirections in urllib3?

When using urllib3, handling redirections is relatively straightforward. By default, urllib3 will not follow redirects unless you use a Redirecting connection pool, such as HTTPConnectionPool or HTTPSConnectionPool with the redirect parameter set to True. However, a more common approach is to use the PoolManager class, which already has redirect handling built-in.

Here's an example of how to use PoolManager to automatically follow redirects:

import urllib3

http = urllib3.PoolManager()

# This will automatically follow redirects
response = http.request('GET', 'http://example.com/')

print(response.status)  # This will be the status of the final response after following any redirects
print(response.data)    # This is the content of the response

Controlling Redirection

If you need more control over redirection, such as limiting the number of redirects or handling them manually, you can configure the PoolManager or HTTPConnectionPool accordingly.

Limiting the number of redirects

You can limit the number of redirects urllib3 will follow by setting the retries parameter to a Retry object with a specific redirect value:

import urllib3
from urllib3.util.retry import Retry

retries = Retry(total=5, redirect=2)  # Allow a total of 5 retries, but only 2 redirects
http = urllib3.PoolManager(retries=retries)

response = http.request('GET', 'http://example.com/')

Handling redirects manually

If you want to handle redirects manually, you can disable automatic redirection by setting the redirect parameter in Retry to False:

import urllib3
from urllib3.util.retry import Retry

# Disabling automatic redirects
retries = Retry(redirect=False)
http = urllib3.PoolManager(retries=retries)

response = http.request('GET', 'http://example.com/')
if response.status in range(300, 399):
    # Handle redirection manually
    location = response.headers.get('Location')
    if location:
        # Make a new request to the URL specified in the Location header
        redirect_response = http.request('GET', location)
        print(redirect_response.status)
        print(redirect_response.data)

Remember that handling redirects manually requires careful consideration of the HTTP status codes, and you should also take into account whether the redirect is a temporary (302, 307) or permanent (301, 308) redirect when deciding how to handle it. Additionally, when following redirects manually, it is important to consider security implications such as redirect loops and unintended URL schemes.

When working with HTTPS and redirects, be aware of the security implications and ensure that the urllib3 library is verifying SSL certificates, or you are providing a custom certificate bundle if necessary.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon