How do I control the maximum number of retries on connection errors using Requests?

In Python, when you're using the requests library to make HTTP requests, you may occasionally encounter connection errors due to network issues, server unresponsiveness, or timeouts. By default, requests does not retry failed requests, but you can control the behavior to implement retries using a Session object and the HTTPAdapter class from the requests.adapters module.

Here's how you can set up a maximum number of retries on connection errors using requests:

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

# Define the maximum number of retries
max_retries = 3

# Create a session
session = requests.Session()

# Define a Retry object with your retry parameters
retries = Retry(
    total=max_retries,
    backoff_factor=1,
    status_forcelist=[500, 502, 503, 504],
)

# Mount an HTTPAdapter to the session's HTTP and HTTPS endpoints
adapter = HTTPAdapter(max_retries=retries)
session.mount('http://', adapter)
session.mount('https://', adapter)

# Now you can make requests through the session, and it will retry on connection errors
url = 'http://example.com'
try:
    response = session.get(url)
    # Handle successful response
except requests.exceptions.RetryError as e:
    # Handle the case where the maximum number of retries is exceeded
    # You can access the last response with e.last_response (if any)
    pass
except requests.exceptions.RequestException as e:
    # Handle other request-related errors (non-retry or after all retries)
    pass

The Retry class has several parameters that you can adjust to control the retry behavior:

  • total: Total number of retries to allow. Set to None for infinite retries.
  • read: How many times to retry on read errors.
  • connect: How many times to retry on connection-related errors.
  • status_forcelist: A set of integer HTTP status codes that we should force a retry on. e.g., [500, 502, 503, 504].
  • backoff_factor: A backoff factor to apply between attempts after the second try (most errors are resolved immediately by a second try without a delay).

The backoff_factor is a delay applied between retries, and it's used to introduce a sleep time in the interval between retries to give the server time to recover. For example, backoff_factor=1 will result in a delay of {backoff factor} * (2 ** (retry number - 1)) seconds. After the first failure, there is no delay, after the second failure, it will wait 1 second, after the third failure, it will wait 2 seconds, and so on.

If a request exceeds the maximum number of retries, a RetryError is raised. You can catch this exception and handle it accordingly.

Remember that it's important to use retries responsibly and not to overload the server with too many quick, repeated requests. Always follow the website's terms of service and respect the robots.txt file when scraping.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon