Guzzle is a PHP HTTP client that makes it simple to send HTTP requests and trivial to integrate with web services. While web scraping, managing cookies is essential to maintain session state, handle authentication, or to deal with website personalization. Guzzle provides a cookie middleware that can be used to manage cookies across multiple requests.
Here's how you can manage cookies in Guzzle:
Using Cookie Jar
Guzzle uses a cookie jar to hold cookies between requests. You can use the built-in CookieJar
class to manage cookies.
- Create a Cookie Jar
use GuzzleHttp\Cookie\CookieJar;
$cookieJar = new CookieJar();
- Send a Request with the Cookie Jar
use GuzzleHttp\Client;
$client = new Client();
$response = $client->request('GET', 'http://example.com', [
'cookies' => $cookieJar
]);
The cookies
option accepts a cookie jar instance. After the request, any cookies set by the server will be stored in the CookieJar
object.
- Send Another Request with the Same Cookie Jar
// The same cookie jar is used, so cookies will be maintained
$response = $client->request('GET', 'http://example.com/another-page', [
'cookies' => $cookieJar
]);
Using a Persistent Cookie Jar
If you want to persist cookies between sessions, you can use a file-based cookie jar.
use GuzzleHttp\Cookie\FileCookieJar;
// Create a cookie jar that stores cookies in a file
$cookieFile = 'path/to/cookiejar.json';
$cookieJar = new FileCookieJar($cookieFile, true);
$client = new Client();
$response = $client->request('GET', 'http://example.com', [
'cookies' => $cookieJar
]);
// Cookies are now saved in the specified file
When you create a FileCookieJar
, you specify the file path and whether it should load existing cookies from the file (true
in this case).
Handling Cookies Manually
If you need to handle cookies manually, for example, to set a specific cookie before a request, you can do so like this:
use GuzzleHttp\Cookie\SetCookie;
use GuzzleHttp\Cookie\CookieJar;
$cookieJar = new CookieJar();
// Manually create a cookie
$cookie = new SetCookie([
'Name' => 'test',
'Value' => 'value',
'Domain' => 'example.com',
'Path' => '/',
'Max-Age' => 1000
]);
// Add the cookie to the cookie jar
$cookieJar->setCookie($cookie);
$client = new Client();
$response = $client->request('GET', 'http://example.com', [
'cookies' => $cookieJar
]);
// Now the request is sent with the manually set cookie
Extracting Cookies from a Response
You can also extract cookies from a response and inspect them:
$response = $client->request('GET', 'http://example.com', [
'cookies' => $cookieJar
]);
// Get all cookies from the response
$cookies = $cookieJar->getIterator();
foreach ($cookies as $cookie) {
echo $cookie->getName() . ': ' . $cookie->getValue();
}
Conclusion
Guzzle makes it easy to manage cookies when scraping websites by using cookie jars to store and send cookies with your HTTP requests. You can use the memory-based CookieJar
for temporary storage or the FileCookieJar
for persistent storage. Additionally, Guzzle allows you to manually handle cookies if you require more granular control over what is being sent and received.