What are Guzzle middleware and how do they work?
Guzzle middleware is a powerful feature of the Guzzle HTTP client library for PHP that allows you to modify requests and responses during the HTTP request lifecycle. Middleware functions as a chain of handlers that can intercept, modify, or react to HTTP requests and responses before they reach their final destination.
Understanding Guzzle Middleware Architecture
Middleware in Guzzle follows a layered architecture pattern where each middleware function wraps around the next handler in the chain. This creates a pipeline where requests flow down through the middleware stack and responses flow back up through the same stack in reverse order.
<?php
use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
use GuzzleHttp\Middleware;
use Psr\Http\Message\RequestInterface;
use Psr\Http\Message\ResponseInterface;
// Create a handler stack
$stack = HandlerStack::create();
// Add middleware to the stack
$stack->push(Middleware::mapRequest(function (RequestInterface $request) {
return $request->withHeader('User-Agent', 'Custom-Agent/1.0');
}));
// Create client with middleware stack
$client = new Client(['handler' => $stack]);
Types of Guzzle Middleware
1. Request Middleware
Request middleware allows you to modify outgoing requests before they are sent to the server. This is useful for adding headers, authentication tokens, or transforming request data.
<?php
use GuzzleHttp\Middleware;
use Psr\Http\Message\RequestInterface;
// Add custom headers to all requests
$requestMiddleware = Middleware::mapRequest(function (RequestInterface $request) {
return $request
->withHeader('X-API-Key', 'your-api-key')
->withHeader('Accept', 'application/json')
->withHeader('Content-Type', 'application/json');
});
$stack->push($requestMiddleware);
2. Response Middleware
Response middleware processes responses after they are received from the server but before they are returned to your application code.
<?php
use GuzzleHttp\Middleware;
use Psr\Http\Message\ResponseInterface;
// Log response status codes
$responseMiddleware = Middleware::mapResponse(function (ResponseInterface $response) {
error_log('Response Status: ' . $response->getStatusCode());
return $response;
});
$stack->push($responseMiddleware);
3. Handler Middleware
Handler middleware provides the most control, allowing you to intercept both requests and responses, implement custom logic, and even prevent requests from being sent.
<?php
use GuzzleHttp\Promise\PromiseInterface;
use Psr\Http\Message\RequestInterface;
$handlerMiddleware = function (callable $handler) {
return function (RequestInterface $request, array $options) use ($handler) {
// Pre-request logic
$start = microtime(true);
// Call the next handler
$promise = $handler($request, $options);
// Post-response logic
return $promise->then(function (ResponseInterface $response) use ($start) {
$duration = microtime(true) - $start;
error_log("Request took {$duration} seconds");
return $response;
});
};
};
$stack->push($handlerMiddleware);
Built-in Middleware Examples
Authentication Middleware
Guzzle provides several built-in middleware for common tasks like authentication:
<?php
use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
use GuzzleHttp\Middleware;
$stack = HandlerStack::create();
// HTTP Basic Authentication
$stack->push(Middleware::httpBasicAuth('username', 'password'));
// OAuth Bearer Token
$stack->push(Middleware::mapRequest(function (RequestInterface $request) {
return $request->withHeader('Authorization', 'Bearer your-token-here');
}));
$client = new Client(['handler' => $stack]);
Retry Middleware
Implement automatic retry logic for failed requests:
<?php
use GuzzleHttp\Exception\ConnectException;
use GuzzleHttp\Exception\RequestException;
use GuzzleHttp\Middleware;
use GuzzleHttp\Psr7\Request;
use GuzzleHttp\Psr7\Response;
$retryMiddleware = Middleware::retry(
function ($retries, Request $request, Response $response = null, RequestException $exception = null) {
// Retry on connection errors or 5xx responses
if ($exception instanceof ConnectException) {
return $retries < 3;
}
if ($response && $response->getStatusCode() >= 500) {
return $retries < 3;
}
return false;
},
function ($retries) {
// Exponential backoff: wait 1s, then 2s, then 4s
return 1000 * pow(2, $retries);
}
);
$stack->push($retryMiddleware);
History Middleware
Track request and response history for debugging:
<?php
use GuzzleHttp\Middleware;
$history = [];
$historyMiddleware = Middleware::history($history);
$stack->push($historyMiddleware);
$client = new Client(['handler' => $stack]);
// Make requests
$client->get('https://api.example.com/data');
$client->post('https://api.example.com/submit', ['json' => ['key' => 'value']]);
// Access history
foreach ($history as $transaction) {
echo "Request: " . $transaction['request']->getUri() . "\n";
if (isset($transaction['response'])) {
echo "Response: " . $transaction['response']->getStatusCode() . "\n";
}
}
Creating Custom Middleware
Logging Middleware
Create a comprehensive logging middleware for web scraping operations:
<?php
use GuzzleHttp\Exception\RequestException;
use Psr\Http\Message\RequestInterface;
use Psr\Http\Message\ResponseInterface;
function createLoggingMiddleware($logger) {
return function (callable $handler) use ($logger) {
return function (RequestInterface $request, array $options) use ($handler, $logger) {
$logger->info('Sending request', [
'method' => $request->getMethod(),
'uri' => (string) $request->getUri(),
'headers' => $request->getHeaders()
]);
return $handler($request, $options)->then(
function (ResponseInterface $response) use ($logger, $request) {
$logger->info('Received response', [
'status' => $response->getStatusCode(),
'uri' => (string) $request->getUri(),
'size' => $response->getHeaderLine('Content-Length')
]);
return $response;
},
function (RequestException $exception) use ($logger, $request) {
$logger->error('Request failed', [
'uri' => (string) $request->getUri(),
'error' => $exception->getMessage()
]);
throw $exception;
}
);
};
};
}
// Usage
$stack->push(createLoggingMiddleware($your_logger));
Rate Limiting Middleware
Implement rate limiting to respect API limits:
<?php
class RateLimitMiddleware
{
private $maxRequests;
private $timeWindow;
private $requests = [];
public function __construct($maxRequests = 100, $timeWindow = 60)
{
$this->maxRequests = $maxRequests;
$this->timeWindow = $timeWindow;
}
public function __invoke(callable $handler)
{
return function (RequestInterface $request, array $options) use ($handler) {
$this->enforceRateLimit();
return $handler($request, $options);
};
}
private function enforceRateLimit()
{
$now = time();
// Remove old requests outside the time window
$this->requests = array_filter($this->requests, function($timestamp) use ($now) {
return ($now - $timestamp) < $this->timeWindow;
});
// Check if we've exceeded the rate limit
if (count($this->requests) >= $this->maxRequests) {
$sleepTime = $this->timeWindow - ($now - min($this->requests));
sleep($sleepTime);
}
// Add current request timestamp
$this->requests[] = $now;
}
}
// Usage
$stack->push(new RateLimitMiddleware(50, 60)); // 50 requests per minute
Middleware Order and Execution
The order in which middleware is added to the stack matters significantly. Middleware added later executes first for requests and last for responses:
<?php
$stack = HandlerStack::create();
// This middleware executes first for requests, last for responses
$stack->push(Middleware::mapRequest(function ($request) {
echo "Middleware 1: Request\n";
return $request;
}));
// This middleware executes second for requests, second-to-last for responses
$stack->push(Middleware::mapRequest(function ($request) {
echo "Middleware 2: Request\n";
return $request;
}));
// This middleware executes last for requests, first for responses
$stack->push(Middleware::mapRequest(function ($request) {
echo "Middleware 3: Request\n";
return $request;
}));
Best Practices for Guzzle Middleware
1. Keep Middleware Focused
Each middleware should have a single responsibility. Don't combine logging, authentication, and retry logic in one middleware.
2. Handle Errors Gracefully
Always include error handling in your middleware to prevent breaking the request chain:
<?php
$middleware = function (callable $handler) {
return function (RequestInterface $request, array $options) use ($handler) {
try {
// Your middleware logic here
return $handler($request, $options);
} catch (\Exception $e) {
// Log error but don't break the chain
error_log('Middleware error: ' . $e->getMessage());
return $handler($request, $options);
}
};
};
3. Use Appropriate Middleware Types
- Use
mapRequest
for simple request modifications - Use
mapResponse
for simple response processing - Use full handler middleware only when you need complete control
4. Consider Performance Impact
Middleware adds overhead to each request. Profile your application to ensure middleware doesn't significantly impact performance, especially when processing large volumes of requests similar to handling AJAX requests using Puppeteer scenarios.
Integration with Web Scraping
When building web scraping applications, middleware becomes particularly valuable for handling common challenges:
<?php
use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
class WebScrapingClient
{
private $client;
public function __construct()
{
$stack = HandlerStack::create();
// Add user agent rotation
$stack->push($this->createUserAgentMiddleware());
// Add retry logic for failed requests
$stack->push($this->createRetryMiddleware());
// Add rate limiting
$stack->push(new RateLimitMiddleware(30, 60));
// Add request/response logging
$stack->push($this->createLoggingMiddleware());
$this->client = new Client([
'handler' => $stack,
'timeout' => 30,
'verify' => false // Only for development
]);
}
private function createUserAgentMiddleware()
{
$userAgents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
];
return Middleware::mapRequest(function (RequestInterface $request) use ($userAgents) {
$randomAgent = $userAgents[array_rand($userAgents)];
return $request->withHeader('User-Agent', $randomAgent);
});
}
public function scrape($url)
{
return $this->client->get($url);
}
}
This approach provides a robust foundation for web scraping operations, similar to how you might handle browser sessions in Puppeteer but with the flexibility and performance benefits of HTTP-based scraping.
Conclusion
Guzzle middleware provides a powerful and flexible way to customize HTTP request handling in PHP applications. By understanding how to create and configure middleware, you can build robust web scraping and API integration solutions that handle authentication, rate limiting, error recovery, and logging consistently across your application.
The middleware pattern's composability allows you to mix and match different behaviors as needed, making your HTTP client both powerful and maintainable. Whether you're building a simple API client or a complex web scraping system, mastering Guzzle middleware will significantly improve your application's reliability and functionality.