Table of contents

What Error Handling Strategies Work Best with Symfony Panther?

Symfony Panther is a powerful browser automation tool that combines the convenience of Symfony's DomCrawler with Chrome/Chromium's headless capabilities. However, working with real browsers introduces various challenges that require robust error handling strategies. This guide explores the most effective approaches to handle errors when using Symfony Panther for web scraping and testing.

Understanding Common Symfony Panther Errors

Before diving into error handling strategies, it's important to understand the types of errors you'll encounter:

  • Timeout errors: When pages or elements take too long to load
  • Element not found errors: When selectors don't match any elements
  • Network errors: Connection issues or failed HTTP requests
  • JavaScript errors: Issues with dynamic content loading
  • Browser crashes: Unexpected browser termination

1. Implementing Timeout Management

Timeouts are among the most common issues in browser automation. Symfony Panther provides several timeout configuration options:

<?php

use Symfony\Component\Panther\PantherTestCase;
use Symfony\Component\Panther\Client;

class WebScrapingService extends PantherTestCase
{
    private Client $client;

    public function __construct()
    {
        $this->client = static::createPantherClient([
            'browser' => static::CHROME,
            'chromeArguments' => [
                '--no-sandbox',
                '--disable-dev-shm-usage',
                '--disable-gpu',
            ]
        ]);

        // Set default timeout for page loads
        $this->client->getWebDriver()->manage()->timeouts()->pageLoadTimeout(30);

        // Set implicit wait for element location
        $this->client->getWebDriver()->manage()->timeouts()->implicitlyWait(10);
    }

    public function scrapeWithTimeoutHandling(string $url): array
    {
        try {
            // Navigate with custom timeout
            $crawler = $this->client->request('GET', $url);

            // Wait for specific element with custom timeout
            $this->client->waitFor('.content', 15);

            return $this->extractData($crawler);

        } catch (\Facebook\WebDriver\Exception\TimeoutException $e) {
            $this->handleTimeoutError($e, $url);
            return [];
        }
    }

    private function handleTimeoutError(\Facebook\WebDriver\Exception\TimeoutException $e, string $url): void
    {
        error_log("Timeout error on URL: {$url}. Message: " . $e->getMessage());

        // Attempt recovery by refreshing the page
        try {
            $this->client->reload();
            sleep(2); // Give page time to load
        } catch (\Exception $recoveryException) {
            error_log("Recovery attempt failed: " . $recoveryException->getMessage());
        }
    }
}

2. Graceful Element Detection and Fallback

One of the most critical aspects of error handling is dealing with missing elements. Instead of letting your script crash, implement fallback mechanisms:

<?php

class ElementHandler
{
    private Client $client;

    public function findElementSafely(string $selector, int $timeout = 10): ?\Facebook\WebDriver\WebDriverElement
    {
        try {
            return $this->client->waitFor($selector, $timeout);
        } catch (\Facebook\WebDriver\Exception\NoSuchElementException $e) {
            error_log("Element not found: {$selector}");
            return null;
        } catch (\Facebook\WebDriver\Exception\TimeoutException $e) {
            error_log("Timeout waiting for element: {$selector}");
            return null;
        }
    }

    public function extractTextWithFallback(array $selectors): ?string
    {
        foreach ($selectors as $selector) {
            $element = $this->findElementSafely($selector);
            if ($element !== null) {
                return $element->getText();
            }
        }

        error_log("None of the fallback selectors found content");
        return null;
    }

    public function scrapeProductData(string $url): array
    {
        $crawler = $this->client->request('GET', $url);

        // Try multiple selectors for the same data
        $title = $this->extractTextWithFallback([
            'h1.product-title',
            '.product-name',
            '[data-testid="product-title"]',
            'h1'
        ]);

        $price = $this->extractTextWithFallback([
            '.price-current',
            '.product-price',
            '[data-price]',
            '.price'
        ]);

        return [
            'title' => $title ?: 'Title not found',
            'price' => $price ?: 'Price not available',
            'url' => $url
        ];
    }
}

3. Network Error Handling and Retry Logic

Network issues are common when scraping multiple pages. Implement exponential backoff and retry mechanisms:

<?php

class NetworkErrorHandler
{
    private Client $client;
    private int $maxRetries = 3;
    private int $baseDelay = 1; // seconds

    public function requestWithRetry(string $method, string $url, int $attempt = 1): ?\Symfony\Component\DomCrawler\Crawler
    {
        try {
            return $this->client->request($method, $url);

        } catch (\Facebook\WebDriver\Exception\WebDriverCurlException $e) {
            return $this->handleNetworkError($e, $method, $url, $attempt);

        } catch (\Facebook\WebDriver\Exception\UnknownServerException $e) {
            return $this->handleServerError($e, $method, $url, $attempt);

        } catch (\Exception $e) {
            error_log("Unexpected error: " . $e->getMessage());
            return null;
        }
    }

    private function handleNetworkError(\Exception $e, string $method, string $url, int $attempt): ?\Symfony\Component\DomCrawler\Crawler
    {
        if ($attempt >= $this->maxRetries) {
            error_log("Max retries exceeded for URL: {$url}");
            return null;
        }

        $delay = $this->baseDelay * pow(2, $attempt - 1); // Exponential backoff
        error_log("Network error (attempt {$attempt}): {$e->getMessage()}. Retrying in {$delay} seconds...");

        sleep($delay);
        return $this->requestWithRetry($method, $url, $attempt + 1);
    }

    private function handleServerError(\Exception $e, string $method, string $url, int $attempt): ?\Symfony\Component\DomCrawler\Crawler
    {
        if ($attempt >= $this->maxRetries) {
            error_log("Server error - max retries exceeded for URL: {$url}");
            return null;
        }

        error_log("Server error (attempt {$attempt}): {$e->getMessage()}");
        sleep(2); // Fixed delay for server errors

        return $this->requestWithRetry($method, $url, $attempt + 1);
    }
}

4. JavaScript Error Detection and Handling

When working with dynamic content, JavaScript errors can break your scraping logic. Here's how to detect and handle them:

<?php

class JavaScriptErrorHandler
{
    private Client $client;

    public function checkForJavaScriptErrors(): array
    {
        $logs = $this->client->getWebDriver()->manage()->getLog('browser');
        $errors = [];

        foreach ($logs as $log) {
            if ($log->getLevel() === 'SEVERE') {
                $errors[] = [
                    'message' => $log->getMessage(),
                    'timestamp' => $log->getTimestamp(),
                    'level' => $log->getLevel()
                ];
            }
        }

        return $errors;
    }

    public function waitForDynamicContent(string $selector, int $timeout = 15): bool
    {
        try {
            // Wait for the element to appear
            $this->client->waitFor($selector, $timeout);

            // Check for JavaScript errors after content loads
            $errors = $this->checkForJavaScriptErrors();
            if (!empty($errors)) {
                error_log("JavaScript errors detected: " . json_encode($errors));
            }

            return true;

        } catch (\Facebook\WebDriver\Exception\TimeoutException $e) {
            error_log("Timeout waiting for dynamic content: {$selector}");

            // Check if JavaScript errors caused the timeout
            $errors = $this->checkForJavaScriptErrors();
            if (!empty($errors)) {
                error_log("JavaScript errors may have caused timeout: " . json_encode($errors));
            }

            return false;
        }
    }

    public function executeJavaScriptSafely(string $script): mixed
    {
        try {
            $result = $this->client->executeScript($script);

            // Check for errors after script execution
            $errors = $this->checkForJavaScriptErrors();
            if (!empty($errors)) {
                error_log("JavaScript errors after script execution: " . json_encode($errors));
            }

            return $result;

        } catch (\Exception $e) {
            error_log("Error executing JavaScript: " . $e->getMessage());
            return null;
        }
    }
}

5. Browser State Management and Recovery

Sometimes the browser gets into an inconsistent state. Implement recovery mechanisms:

<?php

class BrowserStateManager
{
    private Client $client;
    private array $recoveryStrategies = [];

    public function __construct(Client $client)
    {
        $this->client = $client;
        $this->setupRecoveryStrategies();
    }

    private function setupRecoveryStrategies(): void
    {
        $this->recoveryStrategies = [
            'refresh_page' => function() {
                $this->client->reload();
                sleep(2);
            },
            'clear_cache' => function() {
                $this->client->executeScript('window.localStorage.clear(); window.sessionStorage.clear();');
            },
            'close_dialogs' => function() {
                try {
                    $alert = $this->client->getWebDriver()->switchTo()->alert();
                    $alert->dismiss();
                } catch (\Exception $e) {
                    // No alert present
                }
            },
            'restart_browser' => function() {
                $this->client->quit();
                $this->client = static::createPantherClient();
            }
        ];
    }

    public function executeWithRecovery(callable $operation, array $recoveryOptions = ['refresh_page']): mixed
    {
        try {
            return $operation();

        } catch (\Exception $e) {
            error_log("Operation failed: " . $e->getMessage());

            foreach ($recoveryOptions as $strategy) {
                if (isset($this->recoveryStrategies[$strategy])) {
                    error_log("Attempting recovery strategy: {$strategy}");

                    try {
                        $this->recoveryStrategies[$strategy]();

                        // Retry operation after recovery
                        return $operation();

                    } catch (\Exception $recoveryException) {
                        error_log("Recovery strategy '{$strategy}' failed: " . $recoveryException->getMessage());
                        continue;
                    }
                }
            }

            throw new \Exception("All recovery strategies failed. Original error: " . $e->getMessage());
        }
    }
}

6. Comprehensive Error Logging and Monitoring

Effective error handling requires proper logging and monitoring:

<?php

class ErrorLogger
{
    private string $logFile;

    public function __construct(string $logFile = 'panther_errors.log')
    {
        $this->logFile = $logFile;
    }

    public function logError(\Exception $e, array $context = []): void
    {
        $errorData = [
            'timestamp' => date('Y-m-d H:i:s'),
            'type' => get_class($e),
            'message' => $e->getMessage(),
            'file' => $e->getFile(),
            'line' => $e->getLine(),
            'trace' => $e->getTraceAsString(),
            'context' => $context
        ];

        $logMessage = json_encode($errorData, JSON_PRETTY_PRINT) . "\n";
        file_put_contents($this->logFile, $logMessage, FILE_APPEND | LOCK_EX);
    }

    public function logScreenshot(Client $client, string $errorContext): string
    {
        $screenshotPath = 'screenshots/error_' . date('Y-m-d_H-i-s') . '.png';

        try {
            $client->takeScreenshot($screenshotPath);
            error_log("Screenshot saved: {$screenshotPath} - Context: {$errorContext}");
            return $screenshotPath;
        } catch (\Exception $e) {
            error_log("Failed to take screenshot: " . $e->getMessage());
            return '';
        }
    }
}

7. Integration with Testing Frameworks

When using Symfony Panther in tests, implement proper teardown and error reporting:

<?php

use Symfony\Component\Panther\PantherTestCase;

class PantherWebScrapingTest extends PantherTestCase
{
    private ErrorLogger $errorLogger;

    protected function setUp(): void
    {
        parent::setUp();
        $this->errorLogger = new ErrorLogger();
    }

    protected function tearDown(): void
    {
        // Capture any JavaScript errors before closing
        if ($this->client) {
            $errors = $this->client->getWebDriver()->manage()->getLog('browser');
            foreach ($errors as $error) {
                if ($error->getLevel() === 'SEVERE') {
                    error_log("JavaScript error in test: " . $error->getMessage());
                }
            }
        }

        parent::tearDown();
    }

    public function testScrapingWithErrorHandling(): void
    {
        try {
            $crawler = static::$pantherClient->request('GET', 'https://example.com');

            // Your scraping logic here
            $title = $crawler->filter('h1')->text();

            $this->assertNotEmpty($title);

        } catch (\Exception $e) {
            $this->errorLogger->logError($e, ['test' => __METHOD__]);
            $this->errorLogger->logScreenshot(static::$pantherClient, 'Test failure');

            throw $e; // Re-throw to fail the test
        }
    }
}

8. Advanced Error Handling with Circuit Breaker Pattern

For high-volume scraping operations, implement a circuit breaker pattern to prevent cascading failures:

<?php

class CircuitBreaker
{
    private int $failureThreshold;
    private int $resetTimeout;
    private int $failureCount = 0;
    private string $state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
    private int $lastFailureTime = 0;

    public function __construct(int $failureThreshold = 5, int $resetTimeout = 60)
    {
        $this->failureThreshold = $failureThreshold;
        $this->resetTimeout = $resetTimeout;
    }

    public function call(callable $operation): mixed
    {
        if ($this->state === 'OPEN') {
            if (time() - $this->lastFailureTime >= $this->resetTimeout) {
                $this->state = 'HALF_OPEN';
            } else {
                throw new \Exception('Circuit breaker is OPEN - operation not allowed');
            }
        }

        try {
            $result = $operation();
            $this->onSuccess();
            return $result;

        } catch (\Exception $e) {
            $this->onFailure();
            throw $e;
        }
    }

    private function onSuccess(): void
    {
        $this->failureCount = 0;
        $this->state = 'CLOSED';
    }

    private function onFailure(): void
    {
        $this->failureCount++;
        $this->lastFailureTime = time();

        if ($this->failureCount >= $this->failureThreshold) {
            $this->state = 'OPEN';
        }
    }
}

Best Practices for Symfony Panther Error Handling

  1. Set appropriate timeouts: Balance between performance and reliability
  2. Use multiple fallback selectors: Web pages change frequently
  3. Implement exponential backoff: For network-related retries
  4. Log errors with context: Include URLs, selectors, and browser state
  5. Take screenshots on errors: Visual debugging is invaluable
  6. Monitor JavaScript errors: They often indicate dynamic content issues
  7. Clean up resources: Always close browsers and clean up temporary files
  8. Use circuit breakers: For high-volume operations to prevent cascading failures

Similar to how you would handle errors in Puppeteer, Symfony Panther requires careful attention to timeout management and graceful degradation. Additionally, understanding how to handle timeouts in Puppeteer can provide valuable insights that apply to Symfony Panther as well.

Conclusion

Effective error handling in Symfony Panther requires a multi-layered approach that addresses timeouts, element detection failures, network issues, JavaScript errors, and browser state management. By implementing these strategies, you'll create more robust and reliable web scraping applications that can handle the unpredictable nature of modern web applications.

Remember that error handling is not just about preventing crashes—it's about creating graceful degradation paths that allow your application to continue functioning even when individual operations fail. Regular monitoring and logging will help you identify patterns in failures and continuously improve your error handling strategies.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon