Table of contents

How do I set connection and read timeouts separately?

Setting connection and read timeouts separately in Python's Requests library is crucial for building robust web scraping applications. This granular control allows you to handle different types of network delays more effectively and prevents your applications from hanging indefinitely.

Understanding Connection vs Read Timeouts

Before diving into implementation, it's important to understand the difference between these two timeout types:

  • Connection timeout: The maximum time to wait for establishing a connection to the server
  • Read timeout: The maximum time to wait for the server to send data after the connection is established

These timeouts serve different purposes and should be configured based on your specific use case and network conditions.

Basic Syntax for Separate Timeouts

The Requests library accepts a tuple for the timeout parameter, where the first value is the connection timeout and the second is the read timeout:

import requests

# Basic syntax: timeout=(connect_timeout, read_timeout)
response = requests.get('https://example.com', timeout=(5, 30))

Practical Examples

Simple GET Request with Separate Timeouts

import requests
from requests.exceptions import ConnectTimeout, ReadTimeout, RequestException

def fetch_with_timeouts(url):
    try:
        # 3 seconds to connect, 15 seconds to read
        response = requests.get(url, timeout=(3, 15))
        return response
    except ConnectTimeout:
        print("Connection timeout: Server took too long to establish connection")
    except ReadTimeout:
        print("Read timeout: Server took too long to send data")
    except RequestException as e:
        print(f"Request failed: {e}")

    return None

# Usage
url = "https://httpbin.org/delay/2"
response = fetch_with_timeouts(url)
if response:
    print(f"Status: {response.status_code}")

Session-based Requests with Timeouts

For multiple requests, using a session with consistent timeout settings is more efficient:

import requests

def create_session_with_timeouts(connect_timeout=5, read_timeout=30):
    session = requests.Session()

    # Create an adapter with timeout settings
    adapter = requests.adapters.HTTPAdapter()
    session.mount('http://', adapter)
    session.mount('https://', adapter)

    # Set default timeout for all requests in this session
    session.timeout = (connect_timeout, read_timeout)

    return session

# Usage
session = create_session_with_timeouts(connect_timeout=3, read_timeout=20)

urls = [
    'https://httpbin.org/delay/1',
    'https://httpbin.org/delay/2',
    'https://httpbin.org/delay/3'
]

for url in urls:
    try:
        response = session.get(url, timeout=(3, 20))
        print(f"URL: {url}, Status: {response.status_code}")
    except (ConnectTimeout, ReadTimeout) as e:
        print(f"Timeout error for {url}: {e}")

Advanced Timeout Configuration

For more complex scenarios, you can create a wrapper class that handles different timeout strategies:

import requests
import time
from typing import Optional, Tuple

class TimeoutRequestsWrapper:
    def __init__(self, default_connect_timeout=5, default_read_timeout=30):
        self.default_connect_timeout = default_connect_timeout
        self.default_read_timeout = default_read_timeout
        self.session = requests.Session()

    def get(self, url: str, 
            connect_timeout: Optional[float] = None,
            read_timeout: Optional[float] = None,
            **kwargs) -> Optional[requests.Response]:

        # Use custom timeouts or fall back to defaults
        conn_timeout = connect_timeout or self.default_connect_timeout
        read_timeout_val = read_timeout or self.default_read_timeout

        timeout_tuple = (conn_timeout, read_timeout_val)

        try:
            start_time = time.time()
            response = self.session.get(url, timeout=timeout_tuple, **kwargs)
            elapsed_time = time.time() - start_time

            print(f"Request completed in {elapsed_time:.2f} seconds")
            return response

        except ConnectTimeout:
            print(f"Connection timeout ({conn_timeout}s) reached for {url}")
        except ReadTimeout:
            print(f"Read timeout ({read_timeout_val}s) reached for {url}")
        except RequestException as e:
            print(f"Request failed: {e}")

        return None

# Usage example
wrapper = TimeoutRequestsWrapper(default_connect_timeout=2, default_read_timeout=10)

# Use default timeouts
response1 = wrapper.get('https://httpbin.org/delay/1')

# Override timeouts for specific request
response2 = wrapper.get(
    'https://httpbin.org/delay/5', 
    connect_timeout=1, 
    read_timeout=20
)

JavaScript/Node.js Equivalent

For JavaScript developers using libraries like Axios, similar timeout control is available:

const axios = require('axios');

// Configure separate timeouts in Axios
const client = axios.create({
    timeout: 30000, // Overall timeout (includes both connection and response)
});

// For more granular control, use a custom agent
const http = require('http');
const https = require('https');

const httpAgent = new http.Agent({
    timeout: 5000, // Connection timeout
});

const httpsAgent = new https.Agent({
    timeout: 5000, // Connection timeout
});

const customClient = axios.create({
    timeout: 30000, // Read timeout
    httpAgent: httpAgent,
    httpsAgent: httpsAgent
});

async function fetchWithTimeouts(url) {
    try {
        const response = await customClient.get(url);
        return response;
    } catch (error) {
        if (error.code === 'ECONNABORTED') {
            console.log('Request timeout');
        } else if (error.code === 'ECONNREFUSED') {
            console.log('Connection refused');
        } else {
            console.log('Request failed:', error.message);
        }
        return null;
    }
}

Best Practices and Recommendations

Choosing Appropriate Timeout Values

  1. Connection timeout: Usually should be shorter (3-10 seconds)

    • Reflects network latency and server availability
    • Longer values may indicate server issues
  2. Read timeout: Can be longer (15-60 seconds)

    • Depends on expected response processing time
    • Consider the complexity of the requested resource

Environment-Specific Configuration

import os
import requests

class EnvironmentAwareTimeouts:
    def __init__(self):
        # Different timeouts for different environments
        env = os.getenv('ENVIRONMENT', 'development')

        if env == 'production':
            self.connect_timeout = 5
            self.read_timeout = 30
        elif env == 'testing':
            self.connect_timeout = 2
            self.read_timeout = 10
        else:  # development
            self.connect_timeout = 10
            self.read_timeout = 60

    def make_request(self, url, **kwargs):
        timeout = (self.connect_timeout, self.read_timeout)
        return requests.get(url, timeout=timeout, **kwargs)

# Usage
timeout_manager = EnvironmentAwareTimeouts()
response = timeout_manager.make_request('https://api.example.com/data')

Error Handling and Retry Logic

When working with timeouts, implementing proper retry logic is essential:

import requests
import time
from typing import Optional

def request_with_retry(url: str, 
                      max_retries: int = 3,
                      connect_timeout: float = 5,
                      read_timeout: float = 30,
                      backoff_factor: float = 1) -> Optional[requests.Response]:

    for attempt in range(max_retries):
        try:
            response = requests.get(url, timeout=(connect_timeout, read_timeout))
            return response

        except ConnectTimeout:
            print(f"Attempt {attempt + 1}: Connection timeout")
        except ReadTimeout:
            print(f"Attempt {attempt + 1}: Read timeout")
        except RequestException as e:
            print(f"Attempt {attempt + 1}: Request failed - {e}")

        if attempt < max_retries - 1:
            wait_time = backoff_factor * (2 ** attempt)
            print(f"Retrying in {wait_time} seconds...")
            time.sleep(wait_time)

    return None

# Usage
response = request_with_retry(
    'https://unreliable-api.example.com/data',
    max_retries=3,
    connect_timeout=3,
    read_timeout=15,
    backoff_factor=1.5
)

Integration with Web Scraping Workflows

When building web scraping applications, timeout configuration becomes even more critical. For complex scenarios involving dynamic content that loads after page interactions, you might need to combine Requests with browser automation tools.

import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import List, Dict, Any

class WebScrapingTimeoutManager:
    def __init__(self, max_workers: int = 5):
        self.max_workers = max_workers
        self.session = requests.Session()

        # Set reasonable defaults for web scraping
        self.session.timeout = (5, 30)  # 5s connect, 30s read

        # Add common headers to avoid detection
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
        })

    def scrape_urls(self, urls: List[str]) -> List[Dict[str, Any]]:
        results = []

        with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
            # Submit all requests
            future_to_url = {
                executor.submit(self._fetch_single_url, url): url 
                for url in urls
            }

            # Process completed requests
            for future in as_completed(future_to_url):
                url = future_to_url[future]
                try:
                    result = future.result()
                    results.append(result)
                except Exception as e:
                    print(f"Failed to process {url}: {e}")
                    results.append({
                        'url': url,
                        'status': 'error',
                        'error': str(e)
                    })

        return results

    def _fetch_single_url(self, url: str) -> Dict[str, Any]:
        try:
            response = self.session.get(url)
            return {
                'url': url,
                'status_code': response.status_code,
                'content_length': len(response.content),
                'response_time': response.elapsed.total_seconds(),
                'status': 'success'
            }
        except (ConnectTimeout, ReadTimeout) as e:
            return {
                'url': url,
                'status': 'timeout',
                'error': str(e)
            }

# Usage
scraper = WebScrapingTimeoutManager(max_workers=10)
urls = [
    'https://example1.com',
    'https://example2.com',
    'https://example3.com'
]

results = scraper.scrape_urls(urls)
for result in results:
    print(f"URL: {result['url']}, Status: {result['status']}")

Monitoring and Debugging Timeouts

To better understand timeout behavior in your applications, implement logging and monitoring:

import requests
import logging
import time
from contextlib import contextmanager

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@contextmanager
def timeout_monitor(url: str):
    start_time = time.time()
    try:
        yield
    except ConnectTimeout:
        elapsed = time.time() - start_time
        logger.warning(f"Connection timeout for {url} after {elapsed:.2f}s")
        raise
    except ReadTimeout:
        elapsed = time.time() - start_time
        logger.warning(f"Read timeout for {url} after {elapsed:.2f}s")
        raise
    else:
        elapsed = time.time() - start_time
        logger.info(f"Request to {url} completed in {elapsed:.2f}s")

def monitored_request(url: str, connect_timeout: float = 5, read_timeout: float = 30):
    with timeout_monitor(url):
        return requests.get(url, timeout=(connect_timeout, read_timeout))

# Usage
response = monitored_request('https://httpbin.org/delay/2')

Command Line Testing

You can test timeout behavior using curl to understand how different timeouts affect your requests:

# Test connection timeout (time to establish connection)
curl --connect-timeout 5 https://example.com

# Test max-time (overall request timeout including reading)
curl --max-time 30 https://example.com

# Combine both for comprehensive timeout control
curl --connect-timeout 5 --max-time 30 https://example.com

Conclusion

Setting connection and read timeouts separately provides fine-grained control over your HTTP requests and is essential for building robust web scraping applications. By understanding the difference between these timeout types and implementing appropriate error handling, you can create more reliable and predictable network operations.

Remember to choose timeout values based on your specific use case, network conditions, and performance requirements. For applications that need to handle complex authentication flows or dynamic content, consider combining timeout strategies with appropriate retry logic and monitoring to ensure optimal performance and reliability.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon