Guzzle is a PHP HTTP client that simplifies the process of sending HTTP requests and integrating with web services. When scraping websites using Guzzle, you may encounter redirects. By default, Guzzle follows redirects (up to 5 times before it stops), but you can customize this behavior.
Handling Redirects with Guzzle
To handle redirects with Guzzle, you can use the allow_redirects
request option. This option can be set to true
to enable redirects, false
to disable them, or it can be an associative array to specify additional redirect behavior settings.
Here's an example of how to handle redirects using Guzzle:
use GuzzleHttp\Client;
$client = new Client();
// Disabling redirects
$response = $client->request('GET', 'http://example.com', [
'allow_redirects' => false
]);
// Enabling redirects with default settings
$response = $client->request('GET', 'http://example.com', [
'allow_redirects' => true
]);
// Customizing redirect behavior
$response = $client->request('GET', 'http://example.com', [
'allow_redirects' => [
'max' => 10, // Maximum number of redirects to allow
'strict' => true, // Use "strict" RFC compliant redirects
'referer' => true, // Add a Referer header
'protocols' => ['https'], // Only allow https redirects
'track_redirects' => true // Include redirect history in the response
]
]);
// Accessing redirect history (if 'track_redirects' is true)
if ($response->hasHeader('X-Guzzle-Redirect-History')) {
// Retrieve redirect history
$redirectHistory = $response->getHeader('X-Guzzle-Redirect-History');
// Retrieve redirect status history
$redirectStatusHistory = $response->getHeader('X-Guzzle-Redirect-Status-History');
// Output history
foreach ($redirectHistory as $key => $url) {
echo "Redirected to: " . $url . " with status code " . $redirectStatusHistory[$key] . PHP_EOL;
}
}
Understanding Redirect Options
max
: The maximum number of redirects to follow. Guzzle defaults to 5.strict
: Boolean, whether to use strict redirects (meaning onlyPOST
requests are redirected toPOST
requests).referer
: Whether to add a Referer header when a redirect occurs.protocols
: An array of protocols that are allowed for redirects (e.g.,['http', 'https']
).track_redirects
: Whether to track the redirect history. If set to true, Guzzle addsX-Guzzle-Redirect-History
andX-Guzzle-Redirect-Status-History
headers to the response, which can be used to retrieve information about the redirect chain.
Handling Redirects Manually
If you wish to handle redirects manually, you can disable automatic redirects and use the response status code to identify when a redirect has occurred. You can then manually follow the Location
header if needed.
use GuzzleHttp\Client;
$client = new Client();
// Disabling redirects to handle them manually
$response = $client->request('GET', 'http://example.com', [
'allow_redirects' => false
]);
// Check for a redirect response status code (e.g., 301, 302, 303, 307, 308)
if (in_array($response->getStatusCode(), [301, 302, 303, 307, 308])) {
// Extract the Location header to get the redirect URL
$redirectUrl = $response->getHeaderLine('Location');
// Follow the redirect URL manually
$response = $client->request('GET', $redirectUrl);
}
By using these techniques, you can effectively manage and handle redirects while scraping with Guzzle in PHP.