Table of contents

How do I implement retry logic for failed requests in Symfony Panther?

Implementing retry logic for failed requests in Symfony Panther is crucial for building robust web scraping applications that can handle network issues, temporary server problems, and other transient failures. This guide covers various strategies for implementing retry mechanisms with custom conditions, exponential backoff, and comprehensive error handling.

Understanding Retry Logic in Symfony Panther

Symfony Panther, built on top of ChromeDriver and WebDriver, can encounter various types of failures during web scraping operations. These include network timeouts, server errors, element not found exceptions, and browser crashes. Implementing proper retry logic helps ensure your scraping operations are resilient and reliable.

Basic Retry Implementation

Here's a fundamental retry wrapper for Symfony Panther operations:

<?php

use Symfony\Component\Panther\Client;
use Symfony\Component\Panther\DomCrawler\Crawler;

class PantherRetryHelper
{
    private $maxRetries;
    private $baseDelay;

    public function __construct(int $maxRetries = 3, int $baseDelay = 1000)
    {
        $this->maxRetries = $maxRetries;
        $this->baseDelay = $baseDelay; // milliseconds
    }

    public function executeWithRetry(callable $operation, array $retryableExceptions = [])
    {
        $attempt = 0;
        $lastException = null;

        while ($attempt <= $this->maxRetries) {
            try {
                return $operation();
            } catch (\Exception $e) {
                $lastException = $e;

                if (!$this->isRetryableException($e, $retryableExceptions)) {
                    throw $e;
                }

                if ($attempt < $this->maxRetries) {
                    $delay = $this->calculateDelay($attempt);
                    usleep($delay * 1000); // Convert to microseconds
                    $attempt++;
                } else {
                    break;
                }
            }
        }

        throw new \RuntimeException(
            "Operation failed after {$this->maxRetries} retries. Last error: " . 
            $lastException->getMessage(), 
            0, 
            $lastException
        );
    }

    private function isRetryableException(\Exception $e, array $retryableExceptions): bool
    {
        if (empty($retryableExceptions)) {
            // Default retryable exceptions
            $retryableExceptions = [
                \Facebook\WebDriver\Exception\TimeoutException::class,
                \Facebook\WebDriver\Exception\NoSuchElementException::class,
                \Facebook\WebDriver\Exception\StaleElementReferenceException::class,
                \Facebook\WebDriver\Exception\WebDriverCurlException::class,
            ];
        }

        foreach ($retryableExceptions as $exceptionClass) {
            if ($e instanceof $exceptionClass) {
                return true;
            }
        }

        return false;
    }

    private function calculateDelay(int $attempt): int
    {
        // Exponential backoff with jitter
        $exponentialDelay = $this->baseDelay * pow(2, $attempt);
        $jitter = rand(0, $exponentialDelay * 0.1); // 10% jitter

        return $exponentialDelay + $jitter;
    }
}

Advanced Retry Strategies

Exponential Backoff with Circuit Breaker

For production environments, implement a more sophisticated retry mechanism with circuit breaker pattern:

<?php

class AdvancedPantherRetry
{
    private $maxRetries;
    private $circuitBreakerThreshold;
    private $circuitBreakerTimeout;
    private $failureCount = 0;
    private $lastFailureTime = null;
    private $isCircuitOpen = false;

    public function __construct(
        int $maxRetries = 3,
        int $circuitBreakerThreshold = 5,
        int $circuitBreakerTimeout = 60000 // 1 minute
    ) {
        $this->maxRetries = $maxRetries;
        $this->circuitBreakerThreshold = $circuitBreakerThreshold;
        $this->circuitBreakerTimeout = $circuitBreakerTimeout;
    }

    public function executeWithAdvancedRetry(callable $operation, array $options = [])
    {
        if ($this->isCircuitOpen()) {
            throw new \RuntimeException('Circuit breaker is open. Service temporarily unavailable.');
        }

        $retryCondition = $options['retryCondition'] ?? null;
        $maxRetries = $options['maxRetries'] ?? $this->maxRetries;
        $customBackoff = $options['backoffStrategy'] ?? null;

        for ($attempt = 0; $attempt <= $maxRetries; $attempt++) {
            try {
                $result = $operation();
                $this->onSuccess();
                return $result;
            } catch (\Exception $e) {
                $this->onFailure();

                if ($retryCondition && !$retryCondition($e, $attempt)) {
                    throw $e;
                }

                if ($attempt < $maxRetries) {
                    $delay = $customBackoff 
                        ? $customBackoff($attempt) 
                        : $this->getBackoffDelay($attempt);
                    usleep($delay * 1000);
                } else {
                    throw new \RuntimeException(
                        "Operation failed after {$maxRetries} retries: " . $e->getMessage(),
                        0,
                        $e
                    );
                }
            }
        }
    }

    private function isCircuitOpen(): bool
    {
        if (!$this->isCircuitOpen) {
            return false;
        }

        $timeSinceLastFailure = microtime(true) * 1000 - $this->lastFailureTime;
        if ($timeSinceLastFailure > $this->circuitBreakerTimeout) {
            $this->isCircuitOpen = false;
            $this->failureCount = 0;
        }

        return $this->isCircuitOpen;
    }

    private function onSuccess(): void
    {
        $this->failureCount = 0;
        $this->isCircuitOpen = false;
    }

    private function onFailure(): void
    {
        $this->failureCount++;
        $this->lastFailureTime = microtime(true) * 1000;

        if ($this->failureCount >= $this->circuitBreakerThreshold) {
            $this->isCircuitOpen = true;
        }
    }

    private function getBackoffDelay(int $attempt): int
    {
        return min(1000 * pow(2, $attempt), 30000); // Cap at 30 seconds
    }
}

Practical Implementation Examples

Retrying Page Navigation

<?php

use Symfony\Component\Panther\Client;

class PantherNavigationWithRetry
{
    private $client;
    private $retryHelper;

    public function __construct()
    {
        $this->client = Client::createChromeClient();
        $this->retryHelper = new PantherRetryHelper(maxRetries: 3, baseDelay: 2000);
    }

    public function navigateToPage(string $url): Crawler
    {
        return $this->retryHelper->executeWithRetry(function() use ($url) {
            $crawler = $this->client->request('GET', $url);

            // Verify page loaded successfully
            if ($this->client->getWebDriver()->getCurrentURL() !== $url) {
                throw new \RuntimeException('Page navigation failed');
            }

            return $crawler;
        });
    }

    public function findElementWithRetry(string $selector): Crawler
    {
        return $this->retryHelper->executeWithRetry(function() use ($selector) {
            $element = $this->client->getCrawler()->filter($selector);

            if ($element->count() === 0) {
                throw new \Facebook\WebDriver\Exception\NoSuchElementException(
                    "Element not found: {$selector}"
                );
            }

            return $element;
        });
    }

    public function waitAndClick(string $selector): void
    {
        $this->retryHelper->executeWithRetry(function() use ($selector) {
            $element = $this->client->getCrawler()->filter($selector);

            if ($element->count() === 0) {
                throw new \Facebook\WebDriver\Exception\NoSuchElementException(
                    "Clickable element not found: {$selector}"
                );
            }

            $element->click();

            // Wait for any potential page changes
            $this->client->waitFor('.loading-indicator', 5, 100); // Wait for loading to disappear
        });
    }
}

Form Submission with Retry Logic

<?php

class FormSubmissionWithRetry
{
    private $client;
    private $advancedRetry;

    public function __construct()
    {
        $this->client = Client::createChromeClient();
        $this->advancedRetry = new AdvancedPantherRetry();
    }

    public function submitFormWithRetry(array $formData, string $submitSelector): bool
    {
        return $this->advancedRetry->executeWithAdvancedRetry(
            function() use ($formData, $submitSelector) {
                $crawler = $this->client->getCrawler();

                // Fill form fields
                foreach ($formData as $fieldName => $value) {
                    $field = $crawler->filter("input[name='{$fieldName}'], select[name='{$fieldName}'], textarea[name='{$fieldName}']");

                    if ($field->count() === 0) {
                        throw new \InvalidArgumentException("Form field not found: {$fieldName}");
                    }

                    $field->clear()->sendKeys($value);
                }

                // Submit form
                $submitButton = $crawler->filter($submitSelector);
                if ($submitButton->count() === 0) {
                    throw new \Facebook\WebDriver\Exception\NoSuchElementException(
                        "Submit button not found: {$submitSelector}"
                    );
                }

                $submitButton->click();

                // Wait for submission response
                $this->client->waitFor('.success-message, .error-message', 10);

                // Check for success indicators
                $successElements = $this->client->getCrawler()->filter('.success-message');
                return $successElements->count() > 0;
            },
            [
                'maxRetries' => 5,
                'retryCondition' => function(\Exception $e, int $attempt) {
                    // Don't retry on validation errors
                    if (strpos($e->getMessage(), 'validation') !== false) {
                        return false;
                    }
                    return true;
                }
            ]
        );
    }
}

Custom Retry Conditions

Implement specific retry conditions based on your application needs:

<?php

class CustomRetryConditions
{
    public static function networkErrorCondition(): callable
    {
        return function(\Exception $e, int $attempt) {
            $networkErrors = [
                'Connection refused',
                'Connection timed out',
                'Network is unreachable',
                'No route to host'
            ];

            foreach ($networkErrors as $errorPattern) {
                if (strpos($e->getMessage(), $errorPattern) !== false) {
                    return true;
                }
            }

            return false;
        };
    }

    public static function httpStatusCondition(array $retryableStatuses = [500, 502, 503, 504]): callable
    {
        return function(\Exception $e, int $attempt) use ($retryableStatuses) {
            if (preg_match('/HTTP (\d+)/', $e->getMessage(), $matches)) {
                $statusCode = (int)$matches[1];
                return in_array($statusCode, $retryableStatuses);
            }

            return false;
        };
    }

    public static function elementNotFoundCondition(int $maxAttempts = 3): callable
    {
        return function(\Exception $e, int $attempt) use ($maxAttempts) {
            return $e instanceof \Facebook\WebDriver\Exception\NoSuchElementException 
                && $attempt < $maxAttempts;
        };
    }
}

Integration with Logging and Monitoring

<?php

use Psr\Log\LoggerInterface;

class MonitoredPantherRetry
{
    private $logger;
    private $retryHelper;

    public function __construct(LoggerInterface $logger)
    {
        $this->logger = $logger;
        $this->retryHelper = new PantherRetryHelper();
    }

    public function executeWithLogging(callable $operation, string $operationName): mixed
    {
        $startTime = microtime(true);

        try {
            $result = $this->retryHelper->executeWithRetry($operation);

            $duration = microtime(true) - $startTime;
            $this->logger->info("Operation '{$operationName}' succeeded", [
                'duration' => $duration,
                'attempts' => 1
            ]);

            return $result;
        } catch (\Exception $e) {
            $duration = microtime(true) - $startTime;
            $this->logger->error("Operation '{$operationName}' failed after retries", [
                'duration' => $duration,
                'error' => $e->getMessage(),
                'maxAttempts' => $this->retryHelper->getMaxRetries() + 1
            ]);

            throw $e;
        }
    }
}

Timeout Configuration

Configure appropriate timeouts to work alongside your retry logic:

<?php

use Symfony\Component\Panther\Client;

class TimeoutAwareRetry
{
    private $client;
    private $retryHelper;

    public function __construct()
    {
        $options = [
            '--window-size=1920,1080',
            '--disable-dev-shm-usage',
            '--no-sandbox',
            '--disable-gpu'
        ];

        $this->client = Client::createChromeClient(null, $options);

        // Configure timeouts
        $this->client->getWebDriver()->manage()->timeouts()->implicitlyWait(10);
        $this->client->getWebDriver()->manage()->timeouts()->pageLoadTimeout(30);
        $this->client->getWebDriver()->manage()->timeouts()->scriptTimeout(30);

        $this->retryHelper = new PantherRetryHelper(maxRetries: 3, baseDelay: 1000);
    }

    public function scrapeWithTimeouts(string $url, string $dataSelector): array
    {
        return $this->retryHelper->executeWithRetry(function() use ($url, $dataSelector) {
            $crawler = $this->client->request('GET', $url);

            // Wait for specific elements with custom timeout
            $this->client->waitFor($dataSelector, 15); // Wait up to 15 seconds

            $elements = $crawler->filter($dataSelector);
            $data = [];

            $elements->each(function(Crawler $element) use (&$data) {
                $data[] = [
                    'text' => $element->text(),
                    'html' => $element->html()
                ];
            });

            if (empty($data)) {
                throw new \RuntimeException('No data extracted from page');
            }

            return $data;
        });
    }
}

Best Practices for Retry Logic

  1. Choose Appropriate Retry Strategies: Use exponential backoff for network-related errors and immediate retry for transient element issues.

  2. Set Reasonable Limits: Avoid infinite retry loops by setting maximum retry counts and timeout limits.

  3. Log Retry Attempts: Implement comprehensive logging to monitor retry patterns and identify systemic issues.

  4. Handle Different Error Types: Distinguish between retryable and non-retryable errors to avoid unnecessary retry attempts.

  5. Consider Resource Management: Ensure proper cleanup of browser resources even when retries fail.

  6. Monitor Performance: Track retry rates and adjust strategies based on actual failure patterns.

  7. Test Edge Cases: Verify your retry logic handles various failure scenarios correctly.

Similar to how to handle timeouts in Puppeteer, implementing robust retry logic in Symfony Panther requires careful consideration of various failure scenarios. Additionally, when dealing with complex web applications, you might need to combine retry logic with proper error handling strategies to create truly resilient scraping solutions.

Conclusion

Implementing retry logic for failed requests in Symfony Panther is essential for building robust web scraping applications. By combining exponential backoff strategies, custom retry conditions, and proper error handling, you can create resilient systems that gracefully handle temporary failures while avoiding unnecessary resource consumption. Remember to monitor your retry patterns and adjust your strategies based on the specific characteristics of the websites you're scraping.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon