How do I set a timeout for requests to prevent hanging?

Setting proper timeouts for HTTP requests is crucial for building robust web scraping applications. Without timeouts, your requests can hang indefinitely, causing your application to freeze or consume excessive resources. This guide covers how to implement timeouts across different programming languages and libraries.

Understanding Request Timeouts

A timeout defines the maximum amount of time your application will wait for a response before giving up. There are typically two types of timeouts:

Connection timeout: Time to establish a connection to the server
Read timeout: Time to wait for data after the connection is established

Python Requests Library

The Python requests library provides several ways to set timeouts:

Basic Timeout

import requests
from requests.exceptions import Timeout, RequestException

try:
    # Set timeout to 10 seconds for both connection and read
    response = requests.get('https://example.com', timeout=10)
    print(response.status_code)
except Timeout:
    print("Request timed out")
except RequestException as e:
    print(f"Request failed: {e}")

Separate Connection and Read Timeouts

import requests

try:
    # Connection timeout: 5 seconds, Read timeout: 10 seconds
    response = requests.get(
        'https://example.com',
        timeout=(5, 10)
    )
    print(response.text)
except requests.exceptions.ConnectTimeout:
    print("Connection timeout occurred")
except requests.exceptions.ReadTimeout:
    print("Read timeout occurred")
except requests.exceptions.Timeout:
    print("Request timed out")

Session-Level Timeouts

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

# Create a session with default timeout
session = requests.Session()

# Configure retry strategy with timeout
retry_strategy = Retry(
    total=3,
    backoff_factor=1,
    status_forcelist=[429, 500, 502, 503, 504],
)

adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)

try:
    response = session.get('https://example.com', timeout=15)
    print(response.status_code)
except Exception as e:
    print(f"Request failed: {e}")
finally:
    session.close()

JavaScript Fetch API

Modern JavaScript provides timeout functionality through AbortController:

Basic Fetch with Timeout

async function fetchWithTimeout(url, timeout = 10000) {
    const controller = new AbortController();
    const timeoutId = setTimeout(() => controller.abort(), timeout);

    try {
        const response = await fetch(url, {
            signal: controller.signal,
            headers: {
                'User-Agent': 'Mozilla/5.0 (compatible; WebScraper/1.0)'
            }
        });

        clearTimeout(timeoutId);

        if (!response.ok) {
            throw new Error(`HTTP error! status: ${response.status}`);
        }

        return await response.text();
    } catch (error) {
        if (error.name === 'AbortError') {
            throw new Error('Request timed out');
        }
        throw error;
    }
}

// Usage
fetchWithTimeout('https://example.com', 5000)
    .then(data => console.log(data))
    .catch(error => console.error('Error:', error.message));

Node.js with Axios

const axios = require('axios');

// Create axios instance with default timeout
const client = axios.create({
    timeout: 10000, // 10 seconds
    headers: {
        'User-Agent': 'Mozilla/5.0 (compatible; WebScraper/1.0)'
    }
});

async function scrapeWithTimeout(url) {
    try {
        const response = await client.get(url, {
            timeout: 15000 // Override default timeout for this request
        });
        return response.data;
    } catch (error) {
        if (error.code === 'ECONNABORTED') {
            console.error('Request timed out');
        } else {
            console.error('Request failed:', error.message);
        }
        throw error;
    }
}

cURL Command Line

Set timeouts directly in cURL commands:

# Connection timeout: 10 seconds, Max time: 30 seconds
curl --connect-timeout 10 --max-time 30 https://example.com

# DNS resolution timeout
curl --dns-timeout 5 --connect-timeout 10 --max-time 30 https://example.com

# With retry on failure
curl --retry 3 --retry-delay 2 --connect-timeout 10 --max-time 30 https://example.com

PHP with cURL

<?php
function fetchWithTimeout($url, $timeout = 30) {
    $ch = curl_init();

    curl_setopt_array($ch, [
        CURLOPT_URL => $url,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_MAXREDIRS => 5,
        CURLOPT_TIMEOUT => $timeout,           // Total timeout
        CURLOPT_CONNECTTIMEOUT => 10,          // Connection timeout
        CURLOPT_DNS_CACHE_TIMEOUT => 120,      // DNS cache timeout
        CURLOPT_USERAGENT => 'Mozilla/5.0 (compatible; WebScraper/1.0)',
        CURLOPT_SSL_VERIFYPEER => true,
        CURLOPT_SSL_VERIFYHOST => 2,
    ]);

    $response = curl_exec($ch);
    $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    $error = curl_error($ch);

    curl_close($ch);

    if ($error) {
        throw new Exception("cURL error: " . $error);
    }

    if ($httpCode >= 400) {
        throw new Exception("HTTP error: " . $httpCode);
    }

    return $response;
}

try {
    $content = fetchWithTimeout('https://example.com', 20);
    echo $content;
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
}
?>

Go HTTP Client

package main

import (
    "context"
    "fmt"
    "io"
    "net/http"
    "time"
)

func fetchWithTimeout(url string, timeout time.Duration) (string, error) {
    // Create context with timeout
    ctx, cancel := context.WithTimeout(context.Background(), timeout)
    defer cancel()

    // Create request with context
    req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
    if err != nil {
        return "", err
    }

    // Set headers
    req.Header.Set("User-Agent", "Mozilla/5.0 (compatible; WebScraper/1.0)")

    // Create client with transport timeouts
    client := &http.Client{
        Timeout: timeout,
        Transport: &http.Transport{
            DialTimeout:           10 * time.Second,
            TLSHandshakeTimeout:   10 * time.Second,
            ResponseHeaderTimeout: 10 * time.Second,
            ExpectContinueTimeout: 1 * time.Second,
        },
    }

    resp, err := client.Do(req)
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()

    body, err := io.ReadAll(resp.Body)
    if err != nil {
        return "", err
    }

    return string(body), nil
}

func main() {
    content, err := fetchWithTimeout("https://example.com", 15*time.Second)
    if err != nil {
        fmt.Printf("Error: %v\n", err)
        return
    }

    fmt.Println(content)
}

Best Practices for Timeout Configuration

1. Choose Appropriate Timeout Values

Fast APIs: 5-10 seconds
Standard web pages: 15-30 seconds
Large file downloads: 60+ seconds
Connection timeout: Usually 5-10 seconds

2. Implement Exponential Backoff

import time
import random
import requests

def fetch_with_retry(url, max_retries=3, base_delay=1):
    for attempt in range(max_retries):
        try:
            response = requests.get(url, timeout=(5, 15))
            return response
        except requests.exceptions.Timeout:
            if attempt == max_retries - 1:
                raise

            # Exponential backoff with jitter
            delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
            print(f"Timeout on attempt {attempt + 1}, retrying in {delay:.2f}s")
            time.sleep(delay)

3. Different Timeouts for Different Scenarios

class WebScrapingClient:
    def __init__(self):
        self.session = requests.Session()

    def quick_check(self, url):
        """Fast timeout for health checks"""
        return self.session.get(url, timeout=(2, 5))

    def standard_scrape(self, url):
        """Standard timeout for regular scraping"""
        return self.session.get(url, timeout=(5, 15))

    def large_download(self, url):
        """Extended timeout for large files"""
        return self.session.get(url, timeout=(10, 120))

Integration with Web Scraping Tools

When working with browser automation tools, timeout configuration becomes even more critical. For comprehensive timeout handling in browser-based scraping, consider exploring how to handle timeouts in Puppeteer for advanced scenarios involving JavaScript rendering and dynamic content.

Additionally, when dealing with complex page interactions, understanding how to handle AJAX requests using Puppeteer can help you implement proper timeout strategies for asynchronous operations.

Monitoring and Logging Timeouts

import logging
import time
import requests

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def monitored_request(url, timeout=30):
    start_time = time.time()

    try:
        response = requests.get(url, timeout=timeout)
        duration = time.time() - start_time

        logger.info(f"Request to {url} completed in {duration:.2f}s")
        return response

    except requests.exceptions.Timeout:
        duration = time.time() - start_time
        logger.warning(f"Request to {url} timed out after {duration:.2f}s")
        raise
    except Exception as e:
        duration = time.time() - start_time
        logger.error(f"Request to {url} failed after {duration:.2f}s: {e}")
        raise

Conclusion

Proper timeout configuration is essential for reliable web scraping. Start with conservative timeout values and adjust based on your specific requirements and target websites. Always implement proper error handling and consider using retry mechanisms with exponential backoff for improved reliability. Remember that different types of requests may require different timeout strategies, so design your timeout configuration accordingly.

Regular monitoring and logging of timeout occurrences will help you optimize your timeout values and identify problematic endpoints that may require special handling or alternative approaches.

Table of contents

How do I set a timeout for requests to prevent hanging?

Understanding Request Timeouts

Python Requests Library

Basic Timeout

Separate Connection and Read Timeouts

Session-Level Timeouts

JavaScript Fetch API

Basic Fetch with Timeout

Node.js with Axios

cURL Command Line

PHP with cURL

Go HTTP Client

Best Practices for Timeout Configuration

1. Choose Appropriate Timeout Values

2. Implement Exponential Backoff

3. Different Timeouts for Different Scenarios

Integration with Web Scraping Tools

Monitoring and Logging Timeouts

Conclusion

Try WebScraping.AI for Your Web Scraping Needs

Key Features:

Getting Started:

📖 Related Blog Guides

Web Scraping with Python

Python Web Scraping Libraries

Related Questions

How do I upload files using multipart/form-data with Requests?

How do I handle gzip and deflate compression in Requests?

How do I use proxies with the Requests library?

Get Started Now

Support