Table of contents

How can I use Guzzle to download files?

Guzzle is a powerful PHP HTTP client that makes downloading files straightforward and memory-efficient. You can download files by sending a GET request to the file URL and streaming the response directly to disk using the sink option.

Basic File Download

Installation

First, install Guzzle via Composer:

composer require guzzlehttp/guzzle

Simple Download Example

<?php

require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\Exception\RequestException;

$client = new Client();
$url = 'https://example.com/file.pdf';
$saveTo = 'downloads/file.pdf';

try {
    $response = $client->request('GET', $url, ['sink' => $saveTo]);
    echo "File downloaded successfully to " . $saveTo . "\n";
    echo "File size: " . filesize($saveTo) . " bytes\n";
} catch (RequestException $e) {
    echo "Download failed: " . $e->getMessage() . "\n";
    if ($e->hasResponse()) {
        echo "HTTP Status: " . $e->getResponse()->getStatusCode() . "\n";
    }
}

Advanced Download Features

Download with Progress Tracking

Monitor download progress for large files:

<?php

require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\Exception\RequestException;

$client = new Client();
$url = 'https://example.com/large-file.zip';
$saveTo = 'downloads/large-file.zip';

try {
    $response = $client->request('GET', $url, [
        'sink' => $saveTo,
        'progress' => function ($downloadTotal, $downloadedBytes, $uploadTotal, $uploadedBytes) {
            if ($downloadTotal > 0) {
                $percent = round(($downloadedBytes / $downloadTotal) * 100, 2);
                echo "\rProgress: {$percent}% ({$downloadedBytes}/{$downloadTotal} bytes)";
            }
        }
    ]);
    echo "\nDownload completed!\n";
} catch (RequestException $e) {
    echo "\nDownload failed: " . $e->getMessage() . "\n";
}

Download with Custom Headers and Authentication

Download files that require authentication or custom headers:

<?php

require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\Exception\RequestException;

$client = new Client();
$url = 'https://api.example.com/secure/file.pdf';
$saveTo = 'downloads/secure-file.pdf';

try {
    $response = $client->request('GET', $url, [
        'sink' => $saveTo,
        'headers' => [
            'Authorization' => 'Bearer YOUR_API_TOKEN',
            'User-Agent' => 'MyApp/1.0'
        ],
        'timeout' => 30, // 30 second timeout
        'verify' => true // Verify SSL certificates
    ]);
    echo "Secure file downloaded successfully!\n";
} catch (RequestException $e) {
    echo "Download failed: " . $e->getMessage() . "\n";
}

Download with Directory Creation

Automatically create directories if they don't exist:

<?php

require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\Exception\RequestException;

function downloadFile($url, $saveTo) {
    // Create directory if it doesn't exist
    $directory = dirname($saveTo);
    if (!is_dir($directory)) {
        mkdir($directory, 0755, true);
    }

    $client = new Client();

    try {
        $response = $client->request('GET', $url, [
            'sink' => $saveTo,
            'headers' => [
                'User-Agent' => 'Mozilla/5.0 (compatible; FileDownloader/1.0)'
            ]
        ]);

        return [
            'success' => true,
            'file_size' => filesize($saveTo),
            'content_type' => $response->getHeader('Content-Type')[0] ?? 'unknown'
        ];
    } catch (RequestException $e) {
        return [
            'success' => false,
            'error' => $e->getMessage(),
            'status_code' => $e->hasResponse() ? $e->getResponse()->getStatusCode() : null
        ];
    }
}

// Usage
$result = downloadFile('https://example.com/document.pdf', 'downloads/documents/document.pdf');

if ($result['success']) {
    echo "Downloaded successfully!\n";
    echo "File size: {$result['file_size']} bytes\n";
    echo "Content type: {$result['content_type']}\n";
} else {
    echo "Download failed: {$result['error']}\n";
    if ($result['status_code']) {
        echo "HTTP Status: {$result['status_code']}\n";
    }
}

Important Considerations

Memory Efficiency

The sink option streams data directly to disk, making it memory-efficient for large files. Without sink, the entire file would be loaded into memory:

// Memory-efficient (recommended)
$client->request('GET', $url, ['sink' => $saveTo]);

// Memory-intensive (avoid for large files)
$response = $client->request('GET', $url);
file_put_contents($saveTo, $response->getBody());

File Permissions

Ensure the destination directory has write permissions:

$directory = dirname($saveTo);
if (!is_writable($directory)) {
    throw new Exception("Directory {$directory} is not writable");
}

Error Handling Best Practices

Always handle different types of exceptions:

use GuzzleHttp\Exception\ConnectException;
use GuzzleHttp\Exception\RequestException;
use GuzzleHttp\Exception\ClientException;
use GuzzleHttp\Exception\ServerException;

try {
    $response = $client->request('GET', $url, ['sink' => $saveTo]);
} catch (ConnectException $e) {
    echo "Connection failed: " . $e->getMessage() . "\n";
} catch (ClientException $e) {
    echo "Client error (4xx): " . $e->getResponse()->getStatusCode() . "\n";
} catch (ServerException $e) {
    echo "Server error (5xx): " . $e->getResponse()->getStatusCode() . "\n";
} catch (RequestException $e) {
    echo "Request failed: " . $e->getMessage() . "\n";
}

Guzzle's streaming capabilities and flexible configuration options make it an excellent choice for downloading files of any size while maintaining optimal performance and memory usage.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon