Table of contents

What are Guzzle middleware and how do they work?

Guzzle middleware is a powerful feature of the Guzzle HTTP client library for PHP that allows you to modify requests and responses during the HTTP request lifecycle. Middleware functions as a chain of handlers that can intercept, modify, or react to HTTP requests and responses before they reach their final destination.

Understanding Guzzle Middleware Architecture

Middleware in Guzzle follows a layered architecture pattern where each middleware function wraps around the next handler in the chain. This creates a pipeline where requests flow down through the middleware stack and responses flow back up through the same stack in reverse order.

<?php
use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
use GuzzleHttp\Middleware;
use Psr\Http\Message\RequestInterface;
use Psr\Http\Message\ResponseInterface;

// Create a handler stack
$stack = HandlerStack::create();

// Add middleware to the stack
$stack->push(Middleware::mapRequest(function (RequestInterface $request) {
    return $request->withHeader('User-Agent', 'Custom-Agent/1.0');
}));

// Create client with middleware stack
$client = new Client(['handler' => $stack]);

Types of Guzzle Middleware

1. Request Middleware

Request middleware allows you to modify outgoing requests before they are sent to the server. This is useful for adding headers, authentication tokens, or transforming request data.

<?php
use GuzzleHttp\Middleware;
use Psr\Http\Message\RequestInterface;

// Add custom headers to all requests
$requestMiddleware = Middleware::mapRequest(function (RequestInterface $request) {
    return $request
        ->withHeader('X-API-Key', 'your-api-key')
        ->withHeader('Accept', 'application/json')
        ->withHeader('Content-Type', 'application/json');
});

$stack->push($requestMiddleware);

2. Response Middleware

Response middleware processes responses after they are received from the server but before they are returned to your application code.

<?php
use GuzzleHttp\Middleware;
use Psr\Http\Message\ResponseInterface;

// Log response status codes
$responseMiddleware = Middleware::mapResponse(function (ResponseInterface $response) {
    error_log('Response Status: ' . $response->getStatusCode());
    return $response;
});

$stack->push($responseMiddleware);

3. Handler Middleware

Handler middleware provides the most control, allowing you to intercept both requests and responses, implement custom logic, and even prevent requests from being sent.

<?php
use GuzzleHttp\Promise\PromiseInterface;
use Psr\Http\Message\RequestInterface;

$handlerMiddleware = function (callable $handler) {
    return function (RequestInterface $request, array $options) use ($handler) {
        // Pre-request logic
        $start = microtime(true);

        // Call the next handler
        $promise = $handler($request, $options);

        // Post-response logic
        return $promise->then(function (ResponseInterface $response) use ($start) {
            $duration = microtime(true) - $start;
            error_log("Request took {$duration} seconds");
            return $response;
        });
    };
};

$stack->push($handlerMiddleware);

Built-in Middleware Examples

Authentication Middleware

Guzzle provides several built-in middleware for common tasks like authentication:

<?php
use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
use GuzzleHttp\Middleware;

$stack = HandlerStack::create();

// HTTP Basic Authentication
$stack->push(Middleware::httpBasicAuth('username', 'password'));

// OAuth Bearer Token
$stack->push(Middleware::mapRequest(function (RequestInterface $request) {
    return $request->withHeader('Authorization', 'Bearer your-token-here');
}));

$client = new Client(['handler' => $stack]);

Retry Middleware

Implement automatic retry logic for failed requests:

<?php
use GuzzleHttp\Exception\ConnectException;
use GuzzleHttp\Exception\RequestException;
use GuzzleHttp\Middleware;
use GuzzleHttp\Psr7\Request;
use GuzzleHttp\Psr7\Response;

$retryMiddleware = Middleware::retry(
    function ($retries, Request $request, Response $response = null, RequestException $exception = null) {
        // Retry on connection errors or 5xx responses
        if ($exception instanceof ConnectException) {
            return $retries < 3;
        }

        if ($response && $response->getStatusCode() >= 500) {
            return $retries < 3;
        }

        return false;
    },
    function ($retries) {
        // Exponential backoff: wait 1s, then 2s, then 4s
        return 1000 * pow(2, $retries);
    }
);

$stack->push($retryMiddleware);

History Middleware

Track request and response history for debugging:

<?php
use GuzzleHttp\Middleware;

$history = [];
$historyMiddleware = Middleware::history($history);
$stack->push($historyMiddleware);

$client = new Client(['handler' => $stack]);

// Make requests
$client->get('https://api.example.com/data');
$client->post('https://api.example.com/submit', ['json' => ['key' => 'value']]);

// Access history
foreach ($history as $transaction) {
    echo "Request: " . $transaction['request']->getUri() . "\n";
    if (isset($transaction['response'])) {
        echo "Response: " . $transaction['response']->getStatusCode() . "\n";
    }
}

Creating Custom Middleware

Logging Middleware

Create a comprehensive logging middleware for web scraping operations:

<?php
use GuzzleHttp\Exception\RequestException;
use Psr\Http\Message\RequestInterface;
use Psr\Http\Message\ResponseInterface;

function createLoggingMiddleware($logger) {
    return function (callable $handler) use ($logger) {
        return function (RequestInterface $request, array $options) use ($handler, $logger) {
            $logger->info('Sending request', [
                'method' => $request->getMethod(),
                'uri' => (string) $request->getUri(),
                'headers' => $request->getHeaders()
            ]);

            return $handler($request, $options)->then(
                function (ResponseInterface $response) use ($logger, $request) {
                    $logger->info('Received response', [
                        'status' => $response->getStatusCode(),
                        'uri' => (string) $request->getUri(),
                        'size' => $response->getHeaderLine('Content-Length')
                    ]);
                    return $response;
                },
                function (RequestException $exception) use ($logger, $request) {
                    $logger->error('Request failed', [
                        'uri' => (string) $request->getUri(),
                        'error' => $exception->getMessage()
                    ]);
                    throw $exception;
                }
            );
        };
    };
}

// Usage
$stack->push(createLoggingMiddleware($your_logger));

Rate Limiting Middleware

Implement rate limiting to respect API limits:

<?php
class RateLimitMiddleware
{
    private $maxRequests;
    private $timeWindow;
    private $requests = [];

    public function __construct($maxRequests = 100, $timeWindow = 60)
    {
        $this->maxRequests = $maxRequests;
        $this->timeWindow = $timeWindow;
    }

    public function __invoke(callable $handler)
    {
        return function (RequestInterface $request, array $options) use ($handler) {
            $this->enforceRateLimit();
            return $handler($request, $options);
        };
    }

    private function enforceRateLimit()
    {
        $now = time();

        // Remove old requests outside the time window
        $this->requests = array_filter($this->requests, function($timestamp) use ($now) {
            return ($now - $timestamp) < $this->timeWindow;
        });

        // Check if we've exceeded the rate limit
        if (count($this->requests) >= $this->maxRequests) {
            $sleepTime = $this->timeWindow - ($now - min($this->requests));
            sleep($sleepTime);
        }

        // Add current request timestamp
        $this->requests[] = $now;
    }
}

// Usage
$stack->push(new RateLimitMiddleware(50, 60)); // 50 requests per minute

Middleware Order and Execution

The order in which middleware is added to the stack matters significantly. Middleware added later executes first for requests and last for responses:

<?php
$stack = HandlerStack::create();

// This middleware executes first for requests, last for responses
$stack->push(Middleware::mapRequest(function ($request) {
    echo "Middleware 1: Request\n";
    return $request;
}));

// This middleware executes second for requests, second-to-last for responses
$stack->push(Middleware::mapRequest(function ($request) {
    echo "Middleware 2: Request\n";
    return $request;
}));

// This middleware executes last for requests, first for responses
$stack->push(Middleware::mapRequest(function ($request) {
    echo "Middleware 3: Request\n";
    return $request;
}));

Best Practices for Guzzle Middleware

1. Keep Middleware Focused

Each middleware should have a single responsibility. Don't combine logging, authentication, and retry logic in one middleware.

2. Handle Errors Gracefully

Always include error handling in your middleware to prevent breaking the request chain:

<?php
$middleware = function (callable $handler) {
    return function (RequestInterface $request, array $options) use ($handler) {
        try {
            // Your middleware logic here
            return $handler($request, $options);
        } catch (\Exception $e) {
            // Log error but don't break the chain
            error_log('Middleware error: ' . $e->getMessage());
            return $handler($request, $options);
        }
    };
};

3. Use Appropriate Middleware Types

  • Use mapRequest for simple request modifications
  • Use mapResponse for simple response processing
  • Use full handler middleware only when you need complete control

4. Consider Performance Impact

Middleware adds overhead to each request. Profile your application to ensure middleware doesn't significantly impact performance, especially when processing large volumes of requests similar to handling AJAX requests using Puppeteer scenarios.

Integration with Web Scraping

When building web scraping applications, middleware becomes particularly valuable for handling common challenges:

<?php
use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;

class WebScrapingClient
{
    private $client;

    public function __construct()
    {
        $stack = HandlerStack::create();

        // Add user agent rotation
        $stack->push($this->createUserAgentMiddleware());

        // Add retry logic for failed requests
        $stack->push($this->createRetryMiddleware());

        // Add rate limiting
        $stack->push(new RateLimitMiddleware(30, 60));

        // Add request/response logging
        $stack->push($this->createLoggingMiddleware());

        $this->client = new Client([
            'handler' => $stack,
            'timeout' => 30,
            'verify' => false // Only for development
        ]);
    }

    private function createUserAgentMiddleware()
    {
        $userAgents = [
            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
            'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
            'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
        ];

        return Middleware::mapRequest(function (RequestInterface $request) use ($userAgents) {
            $randomAgent = $userAgents[array_rand($userAgents)];
            return $request->withHeader('User-Agent', $randomAgent);
        });
    }

    public function scrape($url)
    {
        return $this->client->get($url);
    }
}

This approach provides a robust foundation for web scraping operations, similar to how you might handle browser sessions in Puppeteer but with the flexibility and performance benefits of HTTP-based scraping.

Conclusion

Guzzle middleware provides a powerful and flexible way to customize HTTP request handling in PHP applications. By understanding how to create and configure middleware, you can build robust web scraping and API integration solutions that handle authentication, rate limiting, error recovery, and logging consistently across your application.

The middleware pattern's composability allows you to mix and match different behaviors as needed, making your HTTP client both powerful and maintainable. Whether you're building a simple API client or a complex web scraping system, mastering Guzzle middleware will significantly improve your application's reliability and functionality.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon