In urllib3
, you can control request timeouts using the timeout
parameter to prevent your application from hanging indefinitely when making HTTP requests.
Types of Timeouts
urllib3
supports two types of timeouts:
- Connection Timeout: Maximum time to wait for establishing a connection to the server
- Read Timeout: Maximum time to wait for receiving a response after connection is established
Setting a Simple Timeout
For basic use cases, pass a single number to set both connection and read timeouts to the same value:
import urllib3
# Create a PoolManager instance
http = urllib3.PoolManager()
# Set timeout to 5 seconds for both connection and read
response = http.request('GET', 'https://httpbin.org/delay/2', timeout=5)
print(f"Status: {response.status}")
print(f"Data: {response.data.decode()}")
Setting Separate Connection and Read Timeouts
For more control, use the Timeout
object to specify different values:
import urllib3
from urllib3.util.timeout import Timeout
http = urllib3.PoolManager()
# Create a Timeout object with separate values
timeout = Timeout(connect=2.0, read=10.0)
# Make request with custom timeout configuration
response = http.request('GET', 'https://httpbin.org/delay/3', timeout=timeout)
print(f"Status: {response.status}")
print(f"Response received in time!")
Using Default Timeouts
You can set default timeouts when creating a PoolManager:
import urllib3
from urllib3.util.timeout import Timeout
# Set default timeout for all requests
default_timeout = Timeout(connect=5.0, read=30.0)
http = urllib3.PoolManager(timeout=default_timeout)
# This request will use the default timeout
response = http.request('GET', 'https://httpbin.org/get')
# Override default timeout for specific requests
response = http.request('GET', 'https://httpbin.org/delay/1', timeout=2.0)
Handling Timeout Exceptions
Always handle timeout exceptions to create robust applications:
import urllib3
from urllib3.exceptions import ReadTimeoutError, ConnectTimeoutError, TimeoutError
http = urllib3.PoolManager()
try:
response = http.request('GET', 'https://httpbin.org/delay/10', timeout=3.0)
print(f"Success: {response.status}")
except ConnectTimeoutError:
print("Connection timeout: Could not establish connection within timeout period")
except ReadTimeoutError:
print("Read timeout: Server did not respond within timeout period")
except TimeoutError as e:
print(f"General timeout error: {e}")
Advanced Timeout Configuration
The Timeout
object supports additional parameters:
from urllib3.util.timeout import Timeout
# Complete timeout configuration
timeout = Timeout(
connect=5.0, # Connection timeout
read=30.0, # Read timeout
total=35.0 # Total timeout (connection + read combined)
)
http = urllib3.PoolManager()
response = http.request('GET', 'https://httpbin.org/get', timeout=timeout)
Best Practices
- Always set timeouts to prevent hanging requests
- Use appropriate values based on your use case:
- Connection timeout: 3-10 seconds
- Read timeout: 10-60 seconds for web scraping
- Handle exceptions gracefully in production code
- Consider retry logic for timeout scenarios
- Set default timeouts at the PoolManager level for consistency
import urllib3
from urllib3.util.timeout import Timeout
from urllib3.util.retry import Retry
# Configure retry strategy with timeout handling
retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["HEAD", "GET", "OPTIONS"]
)
# Create PoolManager with timeout and retry
http = urllib3.PoolManager(
timeout=Timeout(connect=5.0, read=30.0),
retries=retry_strategy
)
try:
response = http.request('GET', 'https://httpbin.org/status/500')
print(f"Success: {response.status}")
except Exception as e:
print(f"Request failed after retries: {e}")
This approach ensures your web scraping applications are both responsive and resilient to network issues.