How do I handle HTTP errors when using Guzzle?

When using Guzzle, PHP's popular HTTP client library, proper error handling is crucial for building reliable web scraping and API integration applications. Guzzle provides several approaches to handle HTTP errors ranging from client-side errors (4xx status codes) to server-side errors (5xx status codes).

Understanding Guzzle's Exception Hierarchy

Guzzle throws specific exceptions based on the type of error encountered:

  • GuzzleHttp\Exception\ClientException: Thrown for 4xx HTTP errors (client-side issues like 400 Bad Request, 404 Not Found)
  • GuzzleHttp\Exception\ServerException: Thrown for 5xx HTTP errors (server-side issues like 500 Internal Server Error, 503 Service Unavailable)
  • GuzzleHttp\Exception\ConnectException: Thrown for networking errors (DNS resolution failures, connection timeouts)
  • GuzzleHttp\Exception\TooManyRedirectsException: Thrown when redirect limit is exceeded
  • GuzzleHttp\Exception\RequestException: Base class for all request-related exceptions

Method 1: Exception Handling with Try-Catch

The most common approach is using try-catch blocks to handle different types of exceptions:

use GuzzleHttp\Client;
use GuzzleHttp\Exception\ClientException;
use GuzzleHttp\Exception\ServerException;
use GuzzleHttp\Exception\ConnectException;
use GuzzleHttp\Exception\RequestException;

$client = new Client();

try {
    $response = $client->request('GET', 'https://example.com/api/resource');
    $data = json_decode($response->getBody(), true);
    echo "Success: " . $response->getStatusCode();
} catch (ClientException $e) {
    // Handle 4xx errors
    $statusCode = $e->getResponse()->getStatusCode();
    echo "Client error ($statusCode): " . $e->getMessage();

    // Access response body for error details
    $errorBody = $e->getResponse()->getBody()->getContents();
    echo "Error details: " . $errorBody;
} catch (ServerException $e) {
    // Handle 5xx errors
    $statusCode = $e->getResponse()->getStatusCode();
    echo "Server error ($statusCode): " . $e->getMessage();
} catch (ConnectException $e) {
    // Handle connection errors
    echo "Connection error: " . $e->getMessage();
} catch (RequestException $e) {
    // Handle any other request-related errors
    echo "Request error: " . $e->getMessage();
}

Method 2: Disabling HTTP Errors and Manual Status Checking

Sometimes you prefer to handle HTTP errors manually by checking status codes:

use GuzzleHttp\Client;

$client = new Client();

try {
    $response = $client->request('GET', 'https://example.com/api/resource', [
        'http_errors' => false, // Disable automatic exception throwing
        'timeout' => 30,
        'connect_timeout' => 10
    ]);

    $statusCode = $response->getStatusCode();

    if ($statusCode >= 200 && $statusCode < 300) {
        // Success
        echo "Success: " . $response->getBody();
    } elseif ($statusCode >= 400 && $statusCode < 500) {
        // Client error
        echo "Client Error ($statusCode): " . $response->getReasonPhrase();
        echo "\nResponse: " . $response->getBody();
    } elseif ($statusCode >= 500) {
        // Server error
        echo "Server Error ($statusCode): " . $response->getReasonPhrase();
    }
} catch (ConnectException $e) {
    // Still need to catch connection errors
    echo "Connection failed: " . $e->getMessage();
}

Method 3: Using Middleware for Global Error Handling

For applications making multiple requests, middleware provides a centralized error handling approach:

use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
use GuzzleHttp\Middleware;
use GuzzleHttp\Exception\RequestException;
use Psr\Http\Message\RequestInterface;
use Psr\Http\Message\ResponseInterface;

// Create custom error handling middleware
$errorHandler = Middleware::mapResponse(function (ResponseInterface $response) {
    $statusCode = $response->getStatusCode();

    if ($statusCode >= 400) {
        // Log error or perform custom handling
        error_log("HTTP Error $statusCode: " . $response->getReasonPhrase());

        // Optionally modify response or throw custom exception
        if ($statusCode >= 500) {
            // Could implement retry logic here
            error_log("Server error detected, consider retrying");
        }
    }

    return $response;
});

// Create handler stack with middleware
$stack = HandlerStack::create();
$stack->push($errorHandler);

$client = new Client([
    'handler' => $stack,
    'http_errors' => false // Let middleware handle errors
]);

$response = $client->request('GET', 'https://example.com/api/resource');

Method 4: Retry Middleware for Transient Errors

For handling temporary failures with automatic retries:

use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
use GuzzleHttp\Middleware;
use GuzzleHttp\Exception\ConnectException;
use GuzzleHttp\Exception\RequestException;
use Psr\Http\Message\RequestInterface;
use Psr\Http\Message\ResponseInterface;

$stack = HandlerStack::create();

// Add retry middleware
$stack->push(Middleware::retry(function (
    $retries,
    RequestInterface $request,
    ResponseInterface $response = null,
    RequestException $exception = null
) {
    // Retry on connection errors or 5xx responses
    if ($retries < 3) {
        if ($exception instanceof ConnectException) {
            return true;
        }
        if ($response && $response->getStatusCode() >= 500) {
            return true;
        }
    }
    return false;
}, function ($retries) {
    // Exponential backoff: 1s, 2s, 4s
    return 1000 * pow(2, $retries);
}));

$client = new Client(['handler' => $stack]);

try {
    $response = $client->request('GET', 'https://example.com/api/resource');
    echo "Success after retries: " . $response->getBody();
} catch (RequestException $e) {
    echo "Failed after all retries: " . $e->getMessage();
}

Best Practices for HTTP Error Handling

  1. Always handle connection errors: Network issues are common in web scraping
  2. Implement appropriate retry logic: Use exponential backoff for transient errors
  3. Log errors appropriately: Include request details and timestamps for debugging
  4. Check response status codes: Don't assume 2xx responses are always successful
  5. Handle rate limiting: Watch for 429 status codes and implement delays
  6. Validate response content: Check for expected data structure even on 200 responses
use GuzzleHttp\Client;
use GuzzleHttp\Exception\RequestException;

function makeRobustRequest($url, $maxRetries = 3) {
    $client = new Client(['timeout' => 30]);

    for ($attempt = 1; $attempt <= $maxRetries; $attempt++) {
        try {
            $response = $client->request('GET', $url, [
                'http_errors' => false,
                'headers' => [
                    'User-Agent' => 'MyApp/1.0'
                ]
            ]);

            $statusCode = $response->getStatusCode();

            if ($statusCode === 200) {
                return $response->getBody()->getContents();
            } elseif ($statusCode === 429) {
                // Rate limited - wait before retry
                sleep(pow(2, $attempt));
                continue;
            } elseif ($statusCode >= 500 && $attempt < $maxRetries) {
                // Server error - retry
                sleep($attempt);
                continue;
            } else {
                throw new Exception("HTTP Error $statusCode: " . $response->getReasonPhrase());
            }

        } catch (RequestException $e) {
            if ($attempt === $maxRetries) {
                throw $e;
            }
            sleep($attempt);
        }
    }

    throw new Exception("Max retries exceeded");
}

By implementing proper error handling strategies, your Guzzle-based applications will be more resilient and provide better user experiences when dealing with unreliable network conditions or external service issues.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon