Table of contents

How do you handle API timeouts and connection issues?

API timeouts and connection issues are inevitable challenges in web scraping and API integration. Network instability, server overload, and temporary outages can cause requests to fail or hang indefinitely. Implementing robust error handling and timeout management is crucial for building reliable scraping applications that can gracefully recover from these issues.

Understanding API Timeouts

API timeouts occur when a request takes longer than the specified time limit to complete. There are typically two types of timeouts to consider:

  • Connection timeout: Time limit for establishing a connection to the server
  • Read timeout: Time limit for receiving a response after the connection is established

Basic Timeout Configuration

Python with Requests

import requests
from requests.exceptions import Timeout, ConnectionError, RequestException
import time
import random

def make_request_with_timeout(url, timeout=30):
    try:
        response = requests.get(
            url,
            timeout=(5, 30),  # (connection_timeout, read_timeout)
            headers={'User-Agent': 'Your-App/1.0'}
        )
        response.raise_for_status()
        return response
    except Timeout:
        print(f"Request timed out for {url}")
        raise
    except ConnectionError:
        print(f"Connection error for {url}")
        raise
    except RequestException as e:
        print(f"Request failed: {e}")
        raise

JavaScript with Fetch

async function makeRequestWithTimeout(url, timeoutMs = 30000) {
    const controller = new AbortController();
    const timeout = setTimeout(() => controller.abort(), timeoutMs);

    try {
        const response = await fetch(url, {
            signal: controller.signal,
            headers: {
                'User-Agent': 'Your-App/1.0'
            }
        });

        clearTimeout(timeout);

        if (!response.ok) {
            throw new Error(`HTTP ${response.status}: ${response.statusText}`);
        }

        return response;
    } catch (error) {
        clearTimeout(timeout);

        if (error.name === 'AbortError') {
            throw new Error('Request timed out');
        }

        throw error;
    }
}

Implementing Retry Logic

Retry logic is essential for handling temporary failures. The key is to distinguish between retryable and non-retryable errors.

Python Retry Implementation

import time
import random
from functools import wraps

def retry_with_backoff(max_retries=3, base_delay=1, max_delay=60, backoff_factor=2):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            retries = 0

            while retries < max_retries:
                try:
                    return func(*args, **kwargs)
                except (Timeout, ConnectionError) as e:
                    retries += 1

                    if retries >= max_retries:
                        print(f"Max retries ({max_retries}) exceeded")
                        raise

                    # Calculate delay with exponential backoff and jitter
                    delay = min(base_delay * (backoff_factor ** (retries - 1)), max_delay)
                    jitter = random.uniform(0.1, 0.3) * delay
                    total_delay = delay + jitter

                    print(f"Retry {retries}/{max_retries} after {total_delay:.2f}s")
                    time.sleep(total_delay)

                except RequestException as e:
                    # Check if error is retryable
                    if hasattr(e, 'response') and e.response.status_code in [429, 502, 503, 504]:
                        retries += 1
                        if retries >= max_retries:
                            raise

                        # Handle rate limiting with longer delay
                        if e.response.status_code == 429:
                            retry_after = e.response.headers.get('Retry-After', 60)
                            time.sleep(int(retry_after))
                        else:
                            delay = base_delay * (backoff_factor ** (retries - 1))
                            time.sleep(delay)
                    else:
                        # Non-retryable error
                        raise

            return None
        return wrapper
    return decorator

@retry_with_backoff(max_retries=3)
def fetch_api_data(url):
    return make_request_with_timeout(url)

JavaScript Retry Implementation

async function retryWithBackoff(
    fn, 
    maxRetries = 3, 
    baseDelay = 1000, 
    maxDelay = 60000, 
    backoffFactor = 2
) {
    let retries = 0;

    while (retries < maxRetries) {
        try {
            return await fn();
        } catch (error) {
            retries++;

            // Check if error is retryable
            const isRetryable = 
                error.name === 'AbortError' || 
                error.message.includes('network') ||
                error.message.includes('timeout') ||
                (error.status && [429, 502, 503, 504].includes(error.status));

            if (!isRetryable || retries >= maxRetries) {
                throw error;
            }

            // Calculate delay with exponential backoff and jitter
            const delay = Math.min(
                baseDelay * Math.pow(backoffFactor, retries - 1), 
                maxDelay
            );
            const jitter = Math.random() * 0.3 * delay;
            const totalDelay = delay + jitter;

            console.log(`Retry ${retries}/${maxRetries} after ${totalDelay}ms`);
            await new Promise(resolve => setTimeout(resolve, totalDelay));
        }
    }
}

// Usage
async function fetchApiData(url) {
    return await retryWithBackoff(
        () => makeRequestWithTimeout(url),
        3, // max retries
        1000, // base delay (1 second)
        30000 // max delay (30 seconds)
    );
}

Circuit Breaker Pattern

The circuit breaker pattern prevents cascading failures by temporarily stopping requests to a failing service.

Python Circuit Breaker

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60, recovery_timeout=30):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.recovery_timeout = recovery_timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = CircuitState.CLOSED

    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = CircuitState.HALF_OPEN
                print("Circuit breaker half-open: testing service")
            else:
                raise Exception("Circuit breaker is OPEN")

        try:
            result = func(*args, **kwargs)
            self.on_success()
            return result
        except Exception as e:
            self.on_failure()
            raise

    def on_success(self):
        self.failure_count = 0
        if self.state == CircuitState.HALF_OPEN:
            self.state = CircuitState.CLOSED
            print("Circuit breaker closed: service recovered")

    def on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()

        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN
            print(f"Circuit breaker opened after {self.failure_count} failures")

# Usage
breaker = CircuitBreaker(failure_threshold=3, recovery_timeout=60)

def protected_api_call(url):
    return breaker.call(fetch_api_data, url)

Advanced Connection Management

Connection Pooling and Session Management

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

class RobustAPIClient:
    def __init__(self, base_url, max_retries=3, pool_connections=10, pool_maxsize=10):
        self.base_url = base_url
        self.session = requests.Session()

        # Configure retry strategy
        retry_strategy = Retry(
            total=max_retries,
            status_forcelist=[429, 500, 502, 503, 504],
            method_whitelist=["HEAD", "GET", "OPTIONS"],
            backoff_factor=1,
            raise_on_status=False
        )

        # Configure HTTP adapter with connection pooling
        adapter = HTTPAdapter(
            max_retries=retry_strategy,
            pool_connections=pool_connections,
            pool_maxsize=pool_maxsize
        )

        self.session.mount("http://", adapter)
        self.session.mount("https://", adapter)

        # Set default headers
        self.session.headers.update({
            'User-Agent': 'RobustAPIClient/1.0',
            'Accept': 'application/json',
            'Connection': 'keep-alive'
        })

    def get(self, endpoint, **kwargs):
        url = f"{self.base_url.rstrip('/')}/{endpoint.lstrip('/')}"

        # Set default timeout if not provided
        kwargs.setdefault('timeout', (5, 30))

        try:
            response = self.session.get(url, **kwargs)
            response.raise_for_status()
            return response
        except requests.exceptions.RequestException as e:
            print(f"Request failed: {e}")
            raise

    def close(self):
        self.session.close()

# Usage
client = RobustAPIClient("https://api.example.com")
try:
    response = client.get("/data")
    data = response.json()
finally:
    client.close()

JavaScript with Axios and Interceptors

import axios from 'axios';

class RobustAPIClient {
    constructor(baseURL, options = {}) {
        this.client = axios.create({
            baseURL,
            timeout: options.timeout || 30000,
            headers: {
                'User-Agent': 'RobustAPIClient/1.0',
                'Accept': 'application/json'
            }
        });

        this.setupInterceptors();
        this.maxRetries = options.maxRetries || 3;
    }

    setupInterceptors() {
        // Request interceptor for debugging
        this.client.interceptors.request.use(
            config => {
                console.log(`Making request to: ${config.url}`);
                return config;
            },
            error => Promise.reject(error)
        );

        // Response interceptor for error handling
        this.client.interceptors.response.use(
            response => response,
            async error => {
                const config = error.config;

                // Initialize retry count
                config.__retryCount = config.__retryCount || 0;

                // Check if we should retry
                const shouldRetry = 
                    config.__retryCount < this.maxRetries &&
                    (error.code === 'ECONNABORTED' || 
                     error.response?.status >= 500 ||
                     error.response?.status === 429);

                if (shouldRetry) {
                    config.__retryCount++;

                    // Calculate delay
                    const delay = Math.pow(2, config.__retryCount) * 1000;

                    console.log(`Retrying request (${config.__retryCount}/${this.maxRetries}) after ${delay}ms`);

                    await new Promise(resolve => setTimeout(resolve, delay));
                    return this.client(config);
                }

                return Promise.reject(error);
            }
        );
    }

    async get(endpoint, config = {}) {
        try {
            const response = await this.client.get(endpoint, config);
            return response.data;
        } catch (error) {
            console.error(`API request failed: ${error.message}`);
            throw error;
        }
    }
}

Monitoring and Logging

Effective monitoring helps identify patterns in API failures and optimize your retry strategies.

Python Logging Implementation

import logging
import time
from contextlib import contextmanager

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

@contextmanager
def api_request_context(url, method='GET'):
    start_time = time.time()
    logger.info(f"Starting {method} request to {url}")

    try:
        yield
        duration = time.time() - start_time
        logger.info(f"Request completed successfully in {duration:.2f}s")
    except Exception as e:
        duration = time.time() - start_time
        logger.error(f"Request failed after {duration:.2f}s: {str(e)}")
        raise

def monitored_api_call(url):
    with api_request_context(url):
        return fetch_api_data(url)

Command Line Testing Tools

Using cURL with Timeout Options

# Set connection timeout (5s) and max time (30s)
curl --connect-timeout 5 --max-time 30 https://api.example.com/data

# Retry failed requests with delays
curl --retry 3 --retry-delay 2 --retry-max-time 60 https://api.example.com/data

# Show detailed timing information
curl -w "@curl-format.txt" https://api.example.com/data

Create a curl-format.txt file for detailed timing:

     time_namelookup:  %{time_namelookup}s\n
        time_connect:  %{time_connect}s\n
     time_appconnect:  %{time_appconnect}s\n
    time_pretransfer:  %{time_pretransfer}s\n
       time_redirect:  %{time_redirect}s\n
  time_starttransfer:  %{time_starttransfer}s\n
                     ----------\n
          time_total:  %{time_total}s\n

Using HTTPie for Testing

# Test with timeout
http --timeout=30 GET https://api.example.com/data

# Test with retry logic using shell scripting
for i in {1..3}; do
    http GET https://api.example.com/data && break
    echo "Attempt $i failed, retrying..."
    sleep $((2**i))
done

Best Practices Summary

  1. Set appropriate timeouts: Use both connection and read timeouts
  2. Implement exponential backoff: Reduce server load during retry attempts
  3. Add jitter to delays: Prevent thundering herd problems
  4. Use circuit breakers: Prevent cascading failures
  5. Monitor and log: Track failure patterns and performance metrics
  6. Handle rate limiting: Respect Retry-After headers and implement proper delays
  7. Distinguish error types: Only retry transient failures
  8. Use connection pooling: Improve performance and resource utilization

When dealing with complex scraping scenarios, you might also need to consider how to handle timeouts in Puppeteer for browser-based scraping, or implement proper error handling strategies for headless browser automation.

By implementing these strategies, you can build resilient API clients that gracefully handle network issues and provide reliable data extraction capabilities for your web scraping applications.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon