Table of contents

How do I add OAuth authentication to my Guzzle web scraping client?

OAuth authentication is essential for secure API access in web scraping applications. This guide shows you how to implement OAuth 1.0 and OAuth 2.0 authentication with Guzzle, PHP's powerful HTTP client.

Prerequisites

  • PHP 7.4 or higher
  • Composer installed
  • Basic understanding of OAuth concepts
  • Valid API credentials from your target service

OAuth 1.0 Implementation

OAuth 1.0 is commonly used by APIs like Twitter's v1.1 API. Here's how to implement it with Guzzle:

Installation

composer require guzzlehttp/guzzle
composer require guzzlehttp/oauth-subscriber

Basic OAuth 1.0 Setup

<?php

require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
use GuzzleHttp\Subscriber\Oauth\Oauth1;

// Create handler stack and add OAuth middleware
$stack = HandlerStack::create();
$oauth = new Oauth1([
    'consumer_key'    => 'your_consumer_key',
    'consumer_secret' => 'your_consumer_secret',
    'token'           => 'your_access_token',
    'token_secret'    => 'your_access_token_secret'
]);
$stack->push($oauth);

// Create client with OAuth handler
$client = new Client([
    'base_uri' => 'https://api.example.com/',
    'handler' => $stack,
    'auth' => 'oauth'
]);

try {
    $response = $client->get('1.1/statuses/user_timeline.json');
    $data = json_decode($response->getBody(), true);
    print_r($data);
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
}

OAuth 1.0 with Custom Parameters

<?php

// For APIs requiring specific OAuth parameters
$oauth = new Oauth1([
    'consumer_key'     => 'your_consumer_key',
    'consumer_secret'  => 'your_consumer_secret',
    'token'           => 'your_access_token',
    'token_secret'    => 'your_access_token_secret',
    'signature_method' => Oauth1::SIGNATURE_METHOD_HMAC,
    'realm'           => 'your_realm', // optional
    'version'         => '1.0'
]);

OAuth 2.0 Implementation

OAuth 2.0 is more modern and widely adopted. Here are implementations for different grant types:

Installation

composer require kamermans/guzzle-oauth2-subscriber

Client Credentials Grant (Server-to-Server)

<?php

require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
use kamermans\OAuth2\OAuth2Middleware;
use kamermans\OAuth2\GrantType\ClientCredentials;

// Create OAuth2 middleware
$stack = HandlerStack::create();

$oauth2 = new OAuth2Middleware(
    new ClientCredentials(
        new Client(['base_uri' => 'https://api.example.com']),
        [
            'client_id'     => 'your_client_id',
            'client_secret' => 'your_client_secret',
            'scope'         => 'read write',
            'token_url'     => '/oauth/token',
        ]
    )
);
$stack->push($oauth2);

// Create authenticated client
$client = new Client([
    'base_uri' => 'https://api.example.com/',
    'handler' => $stack,
]);

try {
    $response = $client->get('/api/data');
    $data = json_decode($response->getBody(), true);
    print_r($data);
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
}

Authorization Code Grant (Web Applications)

<?php

use kamermans\OAuth2\GrantType\AuthorizationCode;

// Step 1: Get authorization code (redirect user to authorization server)
$authUrl = 'https://api.example.com/oauth/authorize?' . http_build_query([
    'client_id' => 'your_client_id',
    'redirect_uri' => 'https://yourapp.com/callback',
    'response_type' => 'code',
    'scope' => 'read write'
]);

// Step 2: Exchange authorization code for access token
$oauth2 = new OAuth2Middleware(
    new AuthorizationCode(
        new Client(['base_uri' => 'https://api.example.com']),
        [
            'client_id'     => 'your_client_id',
            'client_secret' => 'your_client_secret',
            'redirect_uri'  => 'https://yourapp.com/callback',
            'token_url'     => '/oauth/token',
            'auth_code'     => $_GET['code'], // From callback
        ]
    )
);

Password Grant (Resource Owner Password Credentials)

<?php

use kamermans\OAuth2\GrantType\PasswordCredentials;

$oauth2 = new OAuth2Middleware(
    new PasswordCredentials(
        new Client(['base_uri' => 'https://api.example.com']),
        [
            'client_id'     => 'your_client_id',
            'client_secret' => 'your_client_secret',
            'username'      => 'user@example.com',
            'password'      => 'user_password',
            'scope'         => 'read write',
            'token_url'     => '/oauth/token',
        ]
    )
);

Advanced Configuration

Token Persistence

<?php

use kamermans\OAuth2\Persistence\FileTokenPersistence;

// Save tokens to file for reuse
$tokenPersistence = new FileTokenPersistence('/path/to/token.json');

$oauth2 = new OAuth2Middleware(
    new ClientCredentials(
        new Client(['base_uri' => 'https://api.example.com']),
        [
            'client_id'     => 'your_client_id',
            'client_secret' => 'your_client_secret',
            'token_url'     => '/oauth/token',
        ]
    ),
    $tokenPersistence
);

Custom Token Refresh

<?php

use kamermans\OAuth2\GrantType\RefreshToken;

// Handle token refresh automatically
$refreshGrant = new RefreshToken(
    new Client(['base_uri' => 'https://api.example.com']),
    [
        'client_id'     => 'your_client_id',
        'client_secret' => 'your_client_secret',
        'refresh_token' => 'your_refresh_token',
        'token_url'     => '/oauth/token',
    ]
);

$oauth2 = new OAuth2Middleware($refreshGrant);

Error Handling and Retries

<?php

use GuzzleHttp\Exception\ClientException;
use GuzzleHttp\Exception\ServerException;

function makeAuthenticatedRequest($client, $endpoint) {
    $maxRetries = 3;
    $retryDelay = 1; // seconds

    for ($i = 0; $i < $maxRetries; $i++) {
        try {
            $response = $client->get($endpoint);
            return json_decode($response->getBody(), true);

        } catch (ClientException $e) {
            if ($e->getResponse()->getStatusCode() === 401) {
                // Token expired, middleware should handle refresh
                if ($i === $maxRetries - 1) {
                    throw new Exception('Authentication failed after retries');
                }
                sleep($retryDelay);
                continue;
            }
            throw $e;

        } catch (ServerException $e) {
            if ($i === $maxRetries - 1) {
                throw $e;
            }
            sleep($retryDelay * ($i + 1)); // Exponential backoff
        }
    }
}

Complete Example: Twitter API with OAuth 1.0

<?php

require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
use GuzzleHttp\Subscriber\Oauth\Oauth1;

class TwitterScraper {
    private $client;

    public function __construct($consumerKey, $consumerSecret, $accessToken, $accessTokenSecret) {
        $stack = HandlerStack::create();
        $oauth = new Oauth1([
            'consumer_key'    => $consumerKey,
            'consumer_secret' => $consumerSecret,
            'token'           => $accessToken,
            'token_secret'    => $accessTokenSecret
        ]);
        $stack->push($oauth);

        $this->client = new Client([
            'base_uri' => 'https://api.twitter.com/',
            'handler' => $stack,
            'auth' => 'oauth'
        ]);
    }

    public function getUserTimeline($username, $count = 20) {
        try {
            $response = $this->client->get('1.1/statuses/user_timeline.json', [
                'query' => [
                    'screen_name' => $username,
                    'count' => $count,
                    'tweet_mode' => 'extended'
                ]
            ]);

            return json_decode($response->getBody(), true);
        } catch (Exception $e) {
            throw new Exception("Failed to fetch timeline: " . $e->getMessage());
        }
    }
}

// Usage
$scraper = new TwitterScraper(
    'your_consumer_key',
    'your_consumer_secret',
    'your_access_token',
    'your_access_token_secret'
);

$tweets = $scraper->getUserTimeline('username');
foreach ($tweets as $tweet) {
    echo $tweet['full_text'] . "\n\n";
}

Best Practices

  1. Environment Variables: Store credentials in environment variables, not in code
  2. Token Refresh: Implement automatic token refresh for OAuth 2.0
  3. Error Handling: Always handle authentication errors gracefully
  4. Rate Limiting: Respect API rate limits and implement backoff strategies
  5. Logging: Log authentication events for debugging
  6. Secure Storage: Use secure methods to store tokens and credentials

Common Issues and Solutions

Token Expiration

// Check if token is expired and refresh if needed
if ($response->getStatusCode() === 401) {
    // Token expired - OAuth2Middleware should handle this automatically
    // For manual handling, implement token refresh logic
}

Invalid Signatures (OAuth 1.0)

// Ensure system time is synchronized
// Verify all OAuth parameters are correctly encoded
// Check that the signature method matches the API requirements

Legal and Ethical Considerations

When using OAuth for web scraping:

  • Terms of Service: Always comply with API terms of service
  • Rate Limits: Respect imposed rate limits
  • Data Usage: Only collect data you're authorized to access
  • Privacy: Handle user data responsibly and in compliance with privacy laws
  • Authentication: Never share or expose authentication credentials

OAuth authentication provides secure, authorized access to APIs. Use it responsibly and in accordance with the service provider's terms and applicable laws.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon