Table of contents

How to Debug Network Issues When Using Guzzle

Debugging network issues in Guzzle can be challenging, especially when dealing with complex web scraping scenarios or API integrations. This comprehensive guide covers various debugging techniques, tools, and best practices to help you identify and resolve network-related problems when using Guzzle HTTP client in PHP.

Understanding Guzzle's Built-in Debugging Features

Guzzle provides several built-in debugging capabilities that can help you diagnose network issues effectively.

Enabling Debug Mode

The simplest way to start debugging is by enabling Guzzle's debug mode:

use GuzzleHttp\Client;

$client = new Client([
    'debug' => true, // Enable debug output to STDOUT
]);

$response = $client->get('https://api.example.com/data');

For more control over debug output, you can specify a resource or file:

$debugFile = fopen('debug.log', 'a');

$client = new Client([
    'debug' => $debugFile,
]);

Using the RequestOptions Debug Parameter

You can also enable debugging on a per-request basis:

use GuzzleHttp\RequestOptions;

$response = $client->get('https://api.example.com/data', [
    RequestOptions::DEBUG => true
]);

Implementing Custom Logging and Monitoring

Creating a Custom Logger

For production environments, implement a custom logger using PSR-3 compatible loggers like Monolog:

use Monolog\Logger;
use Monolog\Handler\StreamHandler;
use GuzzleHttp\MessageFormatter;
use GuzzleHttp\Middleware;

$logger = new Logger('guzzle');
$logger->pushHandler(new StreamHandler('guzzle.log', Logger::DEBUG));

$stack = \GuzzleHttp\HandlerStack::create();

// Log all requests and responses
$stack->push(
    Middleware::log(
        $logger,
        new MessageFormatter('{method} {uri} HTTP/{version} {req_body}')
    )
);

$client = new Client([
    'handler' => $stack,
]);

Advanced Request/Response Logging

Create detailed logs with custom formatting:

$stack->push(
    Middleware::log(
        $logger,
        new MessageFormatter(
            "REQUEST: {method} {uri}\n" .
            "Headers: {req_headers}\n" .
            "Body: {req_body}\n" .
            "RESPONSE: {code} {phrase}\n" .
            "Headers: {res_headers}\n" .
            "Body: {res_body}\n" .
            "Time: {total_time}s"
        )
    )
);

Network Request Monitoring and Analysis

Capturing Network Metrics

Monitor network performance and identify bottlenecks:

use GuzzleHttp\TransferStats;

$client->get('https://api.example.com/data', [
    'on_stats' => function (TransferStats $stats) use ($logger) {
        $logger->info('Request stats:', [
            'url' => $stats->getEffectiveUri(),
            'total_time' => $stats->getTransferTime(),
            'connect_time' => $stats->getHandlerStats()['connect_time'] ?? null,
            'dns_time' => $stats->getHandlerStats()['namelookup_time'] ?? null,
            'size_download' => $stats->getHandlerStats()['size_download'] ?? null,
            'speed_download' => $stats->getHandlerStats()['speed_download'] ?? null,
        ]);
    }
]);

Monitoring Redirects

Track redirect chains to identify potential issues:

$stack->push(Middleware::redirect(), 'redirect');

$client = new Client([
    'handler' => $stack,
    'allow_redirects' => [
        'max' => 5,
        'strict' => true,
        'referer' => true,
        'track_redirects' => true
    ]
]);

$response = $client->get('https://example.com/redirect-chain');

// Access redirect history
$redirectHistory = $response->getHeaderLine('X-Guzzle-Redirect-History');

Error Handling and Exception Analysis

Comprehensive Exception Handling

Implement robust error handling to capture and analyze different types of network failures:

use GuzzleHttp\Exception\RequestException;
use GuzzleHttp\Exception\ConnectException;
use GuzzleHttp\Exception\TooManyRedirectsException;
use GuzzleHttp\Exception\ClientException;
use GuzzleHttp\Exception\ServerException;

try {
    $response = $client->get('https://api.example.com/data');
} catch (ConnectException $e) {
    // Network connectivity issues
    $logger->error('Connection failed:', [
        'message' => $e->getMessage(),
        'request' => \GuzzleHttp\Psr7\str($e->getRequest()),
        'handler_context' => $e->getHandlerContext()
    ]);
} catch (TooManyRedirectsException $e) {
    // Redirect loop detection
    $logger->error('Too many redirects:', [
        'message' => $e->getMessage(),
        'redirect_count' => count($e->getRedirectHistory())
    ]);
} catch (ClientException $e) {
    // 4xx HTTP status codes
    $logger->error('Client error:', [
        'status_code' => $e->getResponse()->getStatusCode(),
        'response_body' => $e->getResponse()->getBody()->getContents()
    ]);
} catch (ServerException $e) {
    // 5xx HTTP status codes
    $logger->error('Server error:', [
        'status_code' => $e->getResponse()->getStatusCode(),
        'response_headers' => $e->getResponse()->getHeaders()
    ]);
} catch (RequestException $e) {
    // General request exceptions
    $logger->error('Request exception:', [
        'message' => $e->getMessage(),
        'has_response' => $e->hasResponse()
    ]);
}

Analyzing SSL/TLS Issues

Debug SSL certificate and TLS connection problems:

$client = new Client([
    'curl' => [
        CURLOPT_VERBOSE => true,
        CURLOPT_STDERR => fopen('curl_verbose.log', 'a'),
        CURLOPT_SSL_VERIFYPEER => false, // Only for debugging
        CURLOPT_SSL_VERIFYHOST => false, // Only for debugging
    ]
]);

Timeout and Connection Debugging

Configuring Timeouts for Debugging

Set appropriate timeouts and monitor their effectiveness:

$client = new Client([
    'timeout' => 30, // Request timeout
    'connect_timeout' => 10, // Connection timeout
    'read_timeout' => 20, // Read timeout
]);

// Monitor timeout events
$client->get('https://slow-api.example.com/data', [
    'on_stats' => function (TransferStats $stats) {
        if ($stats->getTransferTime() > 25) {
            error_log("Slow request detected: " . $stats->getTransferTime() . "s");
        }
    }
]);

Implementing Retry Logic with Debugging

Add retry mechanisms with detailed logging:

use GuzzleHttp\Retry\GenericRetryStrategy;
use GuzzleHttp\Retry\RetryMiddleware;

$retryStrategy = new GenericRetryStrategy([
    ConnectException::class,
    RequestException::class
], 3); // Retry up to 3 times

$stack->push(RetryMiddleware::factory($retryStrategy), 'retry');

$client = new Client(['handler' => $stack]);

DNS and Network Layer Debugging

DNS Resolution Debugging

Debug DNS-related issues:

$client = new Client([
    'curl' => [
        CURLOPT_RESOLVE => [
            'api.example.com:443:192.168.1.100' // Force specific IP
        ]
    ]
]);

// Log DNS resolution time
$client->get('https://api.example.com/data', [
    'on_stats' => function (TransferStats $stats) {
        $handlerStats = $stats->getHandlerStats();
        if (isset($handlerStats['namelookup_time'])) {
            error_log("DNS lookup time: " . $handlerStats['namelookup_time'] . "s");
        }
    }
]);

Network Interface and Proxy Debugging

Debug proxy configurations and network interfaces:

$client = new Client([
    'proxy' => [
        'http' => 'tcp://proxy.example.com:8080',
        'https' => 'tcp://proxy.example.com:8080',
    ],
    'curl' => [
        CURLOPT_INTERFACE => '192.168.1.50', // Bind to specific interface
        CURLOPT_PROXYTYPE => CURLPROXY_HTTP,
    ]
]);

Performance Profiling and Optimization

Request Profiling

Profile request performance to identify bottlenecks:

class RequestProfiler
{
    private $profiles = [];

    public function profileRequest($url, callable $requestCallback)
    {
        $startTime = microtime(true);
        $startMemory = memory_get_usage();

        try {
            $result = $requestCallback();
            $status = 'success';
        } catch (Exception $e) {
            $result = $e;
            $status = 'error';
        }

        $endTime = microtime(true);
        $endMemory = memory_get_usage();

        $this->profiles[] = [
            'url' => $url,
            'duration' => $endTime - $startTime,
            'memory_used' => $endMemory - $startMemory,
            'status' => $status,
            'timestamp' => date('Y-m-d H:i:s')
        ];

        return $result;
    }

    public function getProfiles()
    {
        return $this->profiles;
    }
}

// Usage
$profiler = new RequestProfiler();

$result = $profiler->profileRequest('https://api.example.com/data', function() use ($client) {
    return $client->get('https://api.example.com/data');
});

Testing and Validation Tools

Mock Servers for Testing

Use Guzzle's mock handler for testing network scenarios:

use GuzzleHttp\Handler\MockHandler;
use GuzzleHttp\Psr7\Response;
use GuzzleHttp\Psr7\Request;
use GuzzleHttp\Exception\RequestException;

$mock = new MockHandler([
    new Response(200, ['X-Foo' => 'Bar'], 'Success'),
    new Response(500, [], 'Server Error'),
    new RequestException('Connection timeout', new Request('GET', 'test'))
]);

$handlerStack = HandlerStack::create($mock);
$client = new Client(['handler' => $handlerStack]);

Network Validation Tools

Implement validation tools to verify network behavior:

class NetworkValidator
{
    public static function validateResponse($response)
    {
        $issues = [];

        // Check response time
        if ($response->hasHeader('X-Response-Time')) {
            $responseTime = floatval($response->getHeaderLine('X-Response-Time'));
            if ($responseTime > 2.0) {
                $issues[] = "Slow response time: {$responseTime}s";
            }
        }

        // Check content encoding
        if ($response->hasHeader('Content-Encoding')) {
            $encoding = $response->getHeaderLine('Content-Encoding');
            if (!in_array($encoding, ['gzip', 'deflate', 'br'])) {
                $issues[] = "Unsupported encoding: {$encoding}";
            }
        }

        return $issues;
    }
}

Command Line Debugging Tools

Using cURL for Comparison

Compare Guzzle behavior with cURL commands:

# Test basic connectivity
curl -v https://api.example.com/data

# Test with specific headers
curl -H "User-Agent: GuzzleHttp/7.0" -H "Accept: application/json" \
     -v https://api.example.com/data

# Test with timeout
curl --connect-timeout 10 --max-time 30 https://api.example.com/data

# Test SSL/TLS
curl -vvv --tlsv1.2 https://api.example.com/data

Network Diagnostics Commands

Use system tools for network diagnostics:

# Test DNS resolution
nslookup api.example.com
dig api.example.com

# Test connectivity
ping api.example.com
traceroute api.example.com

# Test port connectivity
telnet api.example.com 443
nc -zv api.example.com 443

Best Practices for Production Debugging

Structured Logging

Implement structured logging for better analysis:

$logger->info('Guzzle request completed', [
    'method' => $request->getMethod(),
    'uri' => (string) $request->getUri(),
    'status_code' => $response->getStatusCode(),
    'duration' => $transferStats->getTransferTime(),
    'user_agent' => $request->getHeaderLine('User-Agent'),
    'content_length' => $response->getHeaderLine('Content-Length'),
    'server' => $response->getHeaderLine('Server')
]);

Monitoring Integration

Integrate with monitoring systems for real-time debugging:

// Example with custom metrics collection
class GuzzleMetricsCollector
{
    public function collectMetrics(TransferStats $stats)
    {
        $metrics = [
            'guzzle.request.duration' => $stats->getTransferTime(),
            'guzzle.request.size' => $stats->getRequest()->getBody()->getSize(),
            'guzzle.response.size' => $stats->getResponse() ? 
                $stats->getResponse()->getBody()->getSize() : 0
        ];

        // Send to monitoring system (StatsD, Prometheus, etc.)
        foreach ($metrics as $name => $value) {
            $this->sendMetric($name, $value);
        }
    }
}

Integration with Browser Automation Tools

When debugging complex web scraping workflows, network issues in Guzzle may need to be analyzed alongside browser automation tools. For comprehensive debugging of JavaScript-heavy sites, you might need to monitor network requests in Puppeteer to understand the complete picture of your scraping pipeline.

Conclusion

By implementing these comprehensive debugging techniques, you'll be able to effectively diagnose and resolve network issues when using Guzzle. Remember to remove or secure debug information in production environments, and always monitor the performance impact of extensive logging and debugging features.

For web scraping applications that require JavaScript execution alongside HTTP requests, consider how these debugging techniques complement browser automation tools for comprehensive troubleshooting across different layers of your scraping infrastructure.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon