How do I implement retry logic for failed requests in Symfony Panther?
Implementing retry logic for failed requests in Symfony Panther is crucial for building robust web scraping applications that can handle network issues, temporary server problems, and other transient failures. This guide covers various strategies for implementing retry mechanisms with custom conditions, exponential backoff, and comprehensive error handling.
Understanding Retry Logic in Symfony Panther
Symfony Panther, built on top of ChromeDriver and WebDriver, can encounter various types of failures during web scraping operations. These include network timeouts, server errors, element not found exceptions, and browser crashes. Implementing proper retry logic helps ensure your scraping operations are resilient and reliable.
Basic Retry Implementation
Here's a fundamental retry wrapper for Symfony Panther operations:
<?php
use Symfony\Component\Panther\Client;
use Symfony\Component\Panther\DomCrawler\Crawler;
class PantherRetryHelper
{
private $maxRetries;
private $baseDelay;
public function __construct(int $maxRetries = 3, int $baseDelay = 1000)
{
$this->maxRetries = $maxRetries;
$this->baseDelay = $baseDelay; // milliseconds
}
public function executeWithRetry(callable $operation, array $retryableExceptions = [])
{
$attempt = 0;
$lastException = null;
while ($attempt <= $this->maxRetries) {
try {
return $operation();
} catch (\Exception $e) {
$lastException = $e;
if (!$this->isRetryableException($e, $retryableExceptions)) {
throw $e;
}
if ($attempt < $this->maxRetries) {
$delay = $this->calculateDelay($attempt);
usleep($delay * 1000); // Convert to microseconds
$attempt++;
} else {
break;
}
}
}
throw new \RuntimeException(
"Operation failed after {$this->maxRetries} retries. Last error: " .
$lastException->getMessage(),
0,
$lastException
);
}
private function isRetryableException(\Exception $e, array $retryableExceptions): bool
{
if (empty($retryableExceptions)) {
// Default retryable exceptions
$retryableExceptions = [
\Facebook\WebDriver\Exception\TimeoutException::class,
\Facebook\WebDriver\Exception\NoSuchElementException::class,
\Facebook\WebDriver\Exception\StaleElementReferenceException::class,
\Facebook\WebDriver\Exception\WebDriverCurlException::class,
];
}
foreach ($retryableExceptions as $exceptionClass) {
if ($e instanceof $exceptionClass) {
return true;
}
}
return false;
}
private function calculateDelay(int $attempt): int
{
// Exponential backoff with jitter
$exponentialDelay = $this->baseDelay * pow(2, $attempt);
$jitter = rand(0, $exponentialDelay * 0.1); // 10% jitter
return $exponentialDelay + $jitter;
}
}
Advanced Retry Strategies
Exponential Backoff with Circuit Breaker
For production environments, implement a more sophisticated retry mechanism with circuit breaker pattern:
<?php
class AdvancedPantherRetry
{
private $maxRetries;
private $circuitBreakerThreshold;
private $circuitBreakerTimeout;
private $failureCount = 0;
private $lastFailureTime = null;
private $isCircuitOpen = false;
public function __construct(
int $maxRetries = 3,
int $circuitBreakerThreshold = 5,
int $circuitBreakerTimeout = 60000 // 1 minute
) {
$this->maxRetries = $maxRetries;
$this->circuitBreakerThreshold = $circuitBreakerThreshold;
$this->circuitBreakerTimeout = $circuitBreakerTimeout;
}
public function executeWithAdvancedRetry(callable $operation, array $options = [])
{
if ($this->isCircuitOpen()) {
throw new \RuntimeException('Circuit breaker is open. Service temporarily unavailable.');
}
$retryCondition = $options['retryCondition'] ?? null;
$maxRetries = $options['maxRetries'] ?? $this->maxRetries;
$customBackoff = $options['backoffStrategy'] ?? null;
for ($attempt = 0; $attempt <= $maxRetries; $attempt++) {
try {
$result = $operation();
$this->onSuccess();
return $result;
} catch (\Exception $e) {
$this->onFailure();
if ($retryCondition && !$retryCondition($e, $attempt)) {
throw $e;
}
if ($attempt < $maxRetries) {
$delay = $customBackoff
? $customBackoff($attempt)
: $this->getBackoffDelay($attempt);
usleep($delay * 1000);
} else {
throw new \RuntimeException(
"Operation failed after {$maxRetries} retries: " . $e->getMessage(),
0,
$e
);
}
}
}
}
private function isCircuitOpen(): bool
{
if (!$this->isCircuitOpen) {
return false;
}
$timeSinceLastFailure = microtime(true) * 1000 - $this->lastFailureTime;
if ($timeSinceLastFailure > $this->circuitBreakerTimeout) {
$this->isCircuitOpen = false;
$this->failureCount = 0;
}
return $this->isCircuitOpen;
}
private function onSuccess(): void
{
$this->failureCount = 0;
$this->isCircuitOpen = false;
}
private function onFailure(): void
{
$this->failureCount++;
$this->lastFailureTime = microtime(true) * 1000;
if ($this->failureCount >= $this->circuitBreakerThreshold) {
$this->isCircuitOpen = true;
}
}
private function getBackoffDelay(int $attempt): int
{
return min(1000 * pow(2, $attempt), 30000); // Cap at 30 seconds
}
}
Practical Implementation Examples
Retrying Page Navigation
<?php
use Symfony\Component\Panther\Client;
class PantherNavigationWithRetry
{
private $client;
private $retryHelper;
public function __construct()
{
$this->client = Client::createChromeClient();
$this->retryHelper = new PantherRetryHelper(maxRetries: 3, baseDelay: 2000);
}
public function navigateToPage(string $url): Crawler
{
return $this->retryHelper->executeWithRetry(function() use ($url) {
$crawler = $this->client->request('GET', $url);
// Verify page loaded successfully
if ($this->client->getWebDriver()->getCurrentURL() !== $url) {
throw new \RuntimeException('Page navigation failed');
}
return $crawler;
});
}
public function findElementWithRetry(string $selector): Crawler
{
return $this->retryHelper->executeWithRetry(function() use ($selector) {
$element = $this->client->getCrawler()->filter($selector);
if ($element->count() === 0) {
throw new \Facebook\WebDriver\Exception\NoSuchElementException(
"Element not found: {$selector}"
);
}
return $element;
});
}
public function waitAndClick(string $selector): void
{
$this->retryHelper->executeWithRetry(function() use ($selector) {
$element = $this->client->getCrawler()->filter($selector);
if ($element->count() === 0) {
throw new \Facebook\WebDriver\Exception\NoSuchElementException(
"Clickable element not found: {$selector}"
);
}
$element->click();
// Wait for any potential page changes
$this->client->waitFor('.loading-indicator', 5, 100); // Wait for loading to disappear
});
}
}
Form Submission with Retry Logic
<?php
class FormSubmissionWithRetry
{
private $client;
private $advancedRetry;
public function __construct()
{
$this->client = Client::createChromeClient();
$this->advancedRetry = new AdvancedPantherRetry();
}
public function submitFormWithRetry(array $formData, string $submitSelector): bool
{
return $this->advancedRetry->executeWithAdvancedRetry(
function() use ($formData, $submitSelector) {
$crawler = $this->client->getCrawler();
// Fill form fields
foreach ($formData as $fieldName => $value) {
$field = $crawler->filter("input[name='{$fieldName}'], select[name='{$fieldName}'], textarea[name='{$fieldName}']");
if ($field->count() === 0) {
throw new \InvalidArgumentException("Form field not found: {$fieldName}");
}
$field->clear()->sendKeys($value);
}
// Submit form
$submitButton = $crawler->filter($submitSelector);
if ($submitButton->count() === 0) {
throw new \Facebook\WebDriver\Exception\NoSuchElementException(
"Submit button not found: {$submitSelector}"
);
}
$submitButton->click();
// Wait for submission response
$this->client->waitFor('.success-message, .error-message', 10);
// Check for success indicators
$successElements = $this->client->getCrawler()->filter('.success-message');
return $successElements->count() > 0;
},
[
'maxRetries' => 5,
'retryCondition' => function(\Exception $e, int $attempt) {
// Don't retry on validation errors
if (strpos($e->getMessage(), 'validation') !== false) {
return false;
}
return true;
}
]
);
}
}
Custom Retry Conditions
Implement specific retry conditions based on your application needs:
<?php
class CustomRetryConditions
{
public static function networkErrorCondition(): callable
{
return function(\Exception $e, int $attempt) {
$networkErrors = [
'Connection refused',
'Connection timed out',
'Network is unreachable',
'No route to host'
];
foreach ($networkErrors as $errorPattern) {
if (strpos($e->getMessage(), $errorPattern) !== false) {
return true;
}
}
return false;
};
}
public static function httpStatusCondition(array $retryableStatuses = [500, 502, 503, 504]): callable
{
return function(\Exception $e, int $attempt) use ($retryableStatuses) {
if (preg_match('/HTTP (\d+)/', $e->getMessage(), $matches)) {
$statusCode = (int)$matches[1];
return in_array($statusCode, $retryableStatuses);
}
return false;
};
}
public static function elementNotFoundCondition(int $maxAttempts = 3): callable
{
return function(\Exception $e, int $attempt) use ($maxAttempts) {
return $e instanceof \Facebook\WebDriver\Exception\NoSuchElementException
&& $attempt < $maxAttempts;
};
}
}
Integration with Logging and Monitoring
<?php
use Psr\Log\LoggerInterface;
class MonitoredPantherRetry
{
private $logger;
private $retryHelper;
public function __construct(LoggerInterface $logger)
{
$this->logger = $logger;
$this->retryHelper = new PantherRetryHelper();
}
public function executeWithLogging(callable $operation, string $operationName): mixed
{
$startTime = microtime(true);
try {
$result = $this->retryHelper->executeWithRetry($operation);
$duration = microtime(true) - $startTime;
$this->logger->info("Operation '{$operationName}' succeeded", [
'duration' => $duration,
'attempts' => 1
]);
return $result;
} catch (\Exception $e) {
$duration = microtime(true) - $startTime;
$this->logger->error("Operation '{$operationName}' failed after retries", [
'duration' => $duration,
'error' => $e->getMessage(),
'maxAttempts' => $this->retryHelper->getMaxRetries() + 1
]);
throw $e;
}
}
}
Timeout Configuration
Configure appropriate timeouts to work alongside your retry logic:
<?php
use Symfony\Component\Panther\Client;
class TimeoutAwareRetry
{
private $client;
private $retryHelper;
public function __construct()
{
$options = [
'--window-size=1920,1080',
'--disable-dev-shm-usage',
'--no-sandbox',
'--disable-gpu'
];
$this->client = Client::createChromeClient(null, $options);
// Configure timeouts
$this->client->getWebDriver()->manage()->timeouts()->implicitlyWait(10);
$this->client->getWebDriver()->manage()->timeouts()->pageLoadTimeout(30);
$this->client->getWebDriver()->manage()->timeouts()->scriptTimeout(30);
$this->retryHelper = new PantherRetryHelper(maxRetries: 3, baseDelay: 1000);
}
public function scrapeWithTimeouts(string $url, string $dataSelector): array
{
return $this->retryHelper->executeWithRetry(function() use ($url, $dataSelector) {
$crawler = $this->client->request('GET', $url);
// Wait for specific elements with custom timeout
$this->client->waitFor($dataSelector, 15); // Wait up to 15 seconds
$elements = $crawler->filter($dataSelector);
$data = [];
$elements->each(function(Crawler $element) use (&$data) {
$data[] = [
'text' => $element->text(),
'html' => $element->html()
];
});
if (empty($data)) {
throw new \RuntimeException('No data extracted from page');
}
return $data;
});
}
}
Best Practices for Retry Logic
Choose Appropriate Retry Strategies: Use exponential backoff for network-related errors and immediate retry for transient element issues.
Set Reasonable Limits: Avoid infinite retry loops by setting maximum retry counts and timeout limits.
Log Retry Attempts: Implement comprehensive logging to monitor retry patterns and identify systemic issues.
Handle Different Error Types: Distinguish between retryable and non-retryable errors to avoid unnecessary retry attempts.
Consider Resource Management: Ensure proper cleanup of browser resources even when retries fail.
Monitor Performance: Track retry rates and adjust strategies based on actual failure patterns.
Test Edge Cases: Verify your retry logic handles various failure scenarios correctly.
Similar to how to handle timeouts in Puppeteer, implementing robust retry logic in Symfony Panther requires careful consideration of various failure scenarios. Additionally, when dealing with complex web applications, you might need to combine retry logic with proper error handling strategies to create truly resilient scraping solutions.
Conclusion
Implementing retry logic for failed requests in Symfony Panther is essential for building robust web scraping applications. By combining exponential backoff strategies, custom retry conditions, and proper error handling, you can create resilient systems that gracefully handle temporary failures while avoiding unnecessary resource consumption. Remember to monitor your retry patterns and adjust your strategies based on the specific characteristics of the websites you're scraping.