Table of contents

How do I integrate Symfony Panther with existing Symfony applications?

Symfony Panther is a powerful web browser automation library that brings the convenience of browser testing and web scraping to Symfony applications. It provides a PHP interface to control headless Chrome or Firefox browsers, making it perfect for end-to-end testing, web scraping, and JavaScript-heavy application testing.

What is Symfony Panther?

Symfony Panther is built on top of Facebook's WebDriver protocol and ChromeDriver, offering a seamless integration with Symfony's testing framework. Unlike traditional HTTP clients, Panther can execute JavaScript, handle dynamic content, and interact with modern web applications just like a real user would.

Installation and Setup

Prerequisites

Before integrating Symfony Panther, ensure your system meets these requirements:

  • PHP 7.2 or higher
  • Symfony 4.4+ or Symfony 5.x/6.x
  • Chrome or Chromium browser installed
  • ChromeDriver (automatically managed by Panther)

Installing Symfony Panther

Install Symfony Panther using Composer in your existing Symfony project:

composer require --dev symfony/panther

For production use (web scraping applications), you can install it without the --dev flag:

composer require symfony/panther

Basic Configuration

Create a basic configuration file config/packages/test/panther.yaml:

# config/packages/test/panther.yaml
framework:
    test: ~

panther:
    # Chrome binary path (optional, auto-detected by default)
    chrome_binary: '/usr/bin/google-chrome'

    # Chrome arguments
    chrome_arguments:
        - '--no-sandbox'
        - '--disable-dev-shm-usage'
        - '--disable-gpu'
        - '--headless'

    # WebDriver hub URL (for Selenium Grid)
    # hub_url: 'http://127.0.0.1:4444/wd/hub'

Integration Patterns

1. Testing Integration

The most common integration pattern is using Panther for functional testing. Create a test class that extends PantherTestCase:

<?php
// tests/Controller/HomePageTest.php

namespace App\Tests\Controller;

use Symfony\Component\Panther\PantherTestCase;

class HomePageTest extends PantherTestCase
{
    public function testHomePage(): void
    {
        $client = static::createPantherClient();
        $crawler = $client->request('GET', '/');

        $this->assertSelectorTextContains('h1', 'Welcome');
        $this->assertPageTitleContains('My Application');

        // Test JavaScript functionality
        $client->executeScript('document.querySelector("#toggle-button").click()');
        $this->assertSelectorIsVisible('#hidden-content');
    }

    public function testFormSubmission(): void
    {
        $client = static::createPantherClient();
        $crawler = $client->request('GET', '/contact');

        // Fill and submit form
        $form = $crawler->selectButton('Submit')->form([
            'contact[name]' => 'John Doe',
            'contact[email]' => 'john@example.com',
            'contact[message]' => 'Test message'
        ]);

        $client->submit($form);
        $this->assertSelectorTextContains('.alert-success', 'Message sent successfully');
    }
}

2. Service Integration for Web Scraping

Create a dedicated service for web scraping tasks:

<?php
// src/Service/WebScrapingService.php

namespace App\Service;

use Symfony\Component\Panther\Client;
use Symfony\Component\Panther\PantherTestCase;

class WebScrapingService
{
    private Client $client;

    public function __construct()
    {
        $this->client = Client::createChromeClient();
    }

    public function scrapeProductData(string $url): array
    {
        $crawler = $this->client->request('GET', $url);

        // Wait for dynamic content to load
        $this->client->waitFor('.product-title');

        return [
            'title' => $crawler->filter('.product-title')->text(),
            'price' => $crawler->filter('.price')->text(),
            'description' => $crawler->filter('.description')->text(),
            'images' => $crawler->filter('.product-images img')->each(function ($node) {
                return $node->attr('src');
            })
        ];
    }

    public function scrapeWithPagination(string $baseUrl): array
    {
        $results = [];
        $page = 1;

        do {
            $crawler = $this->client->request('GET', $baseUrl . '?page=' . $page);

            // Extract data from current page
            $pageData = $crawler->filter('.item')->each(function ($node) {
                return [
                    'title' => $node->filter('.title')->text(),
                    'link' => $node->filter('a')->attr('href')
                ];
            });

            $results = array_merge($results, $pageData);

            // Check if next page exists
            $hasNextPage = $crawler->filter('.pagination .next')->count() > 0;
            $page++;

        } while ($hasNextPage && $page <= 10); // Limit to prevent infinite loops

        return $results;
    }

    public function __destruct()
    {
        $this->client->quit();
    }
}

3. Command Integration

Create console commands for automated scraping tasks:

<?php
// src/Command/ScrapeCommand.php

namespace App\Command;

use App\Service\WebScrapingService;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputArgument;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;

class ScrapeCommand extends Command
{
    protected static $defaultName = 'app:scrape';

    private WebScrapingService $scrapingService;

    public function __construct(WebScrapingService $scrapingService)
    {
        $this->scrapingService = $scrapingService;
        parent::__construct();
    }

    protected function configure(): void
    {
        $this
            ->setDescription('Scrape data from a website')
            ->addArgument('url', InputArgument::REQUIRED, 'URL to scrape');
    }

    protected function execute(InputInterface $input, OutputInterface $output): int
    {
        $io = new SymfonyStyle($input, $output);
        $url = $input->getArgument('url');

        $io->title('Starting web scraping...');

        try {
            $data = $this->scrapingService->scrapeProductData($url);

            $io->table(['Field', 'Value'], [
                ['Title', $data['title']],
                ['Price', $data['price']],
                ['Description', substr($data['description'], 0, 100) . '...']
            ]);

            $io->success('Scraping completed successfully!');
            return Command::SUCCESS;

        } catch (\Exception $e) {
            $io->error('Scraping failed: ' . $e->getMessage());
            return Command::FAILURE;
        }
    }
}

Advanced Configuration Options

Custom Browser Options

Configure Panther with custom browser options for specific use cases:

<?php
// config/services.yaml

services:
    app.panther.client:
        class: Symfony\Component\Panther\Client
        factory: ['Symfony\Component\Panther\Client', 'createChromeClient']
        arguments:
            - null  # kernel (null for non-test usage)
            - null  # manager
            - {
                'chromeArguments': [
                    '--no-sandbox',
                    '--disable-dev-shm-usage',
                    '--disable-gpu',
                    '--headless',
                    '--window-size=1920,1080',
                    '--user-agent=Mozilla/5.0 (compatible; MyBot/1.0)'
                ]
            }

Environment-Specific Configuration

Set up different configurations for various environments:

# config/packages/dev/panther.yaml
panther:
    chrome_arguments:
        - '--no-sandbox'
        - '--disable-dev-shm-usage'
        # Remove --headless for development debugging

# config/packages/prod/panther.yaml
panther:
    chrome_arguments:
        - '--no-sandbox'
        - '--disable-dev-shm-usage'
        - '--disable-gpu'
        - '--headless'
        - '--disable-extensions'
        - '--remote-debugging-port=9222'

Handling Dynamic Content and AJAX

When working with JavaScript-heavy applications, you'll often need to wait for content to load. Similar to how you handle AJAX requests using Puppeteer, Panther provides several waiting mechanisms:

<?php

class DynamicContentService
{
    private Client $client;

    public function scrapeAjaxContent(string $url): array
    {
        $crawler = $this->client->request('GET', $url);

        // Wait for specific element to appear
        $this->client->waitFor('#dynamic-content', 10);

        // Wait for AJAX request to complete
        $this->client->waitForVisibility('.loading-spinner', 2);
        $this->client->waitForInvisibility('.loading-spinner', 10);

        // Extract data after AJAX loads
        return $crawler->filter('.ajax-content .item')->each(function ($node) {
            return $node->text();
        });
    }

    public function handleInfiniteScroll(string $url): array
    {
        $crawler = $this->client->request('GET', $url);
        $results = [];

        do {
            // Get current items
            $items = $crawler->filter('.item')->each(function ($node) {
                return $node->text();
            });

            $results = array_merge($results, $items);

            // Scroll to trigger more content
            $this->client->executeScript('window.scrollTo(0, document.body.scrollHeight)');

            // Wait for new content to load
            $initialCount = count($results);
            $this->client->waitFor('.item:nth-child(' . ($initialCount + 1) . ')', 5);

        } while (count($crawler->filter('.item')) > count($results));

        return $results;
    }
}

Error Handling and Debugging

Implement robust error handling for your Panther integration:

<?php

class RobustScrapingService
{
    private Client $client;
    private LoggerInterface $logger;

    public function scrapeWithRetry(string $url, int $maxRetries = 3): array
    {
        $attempt = 0;

        while ($attempt < $maxRetries) {
            try {
                $crawler = $this->client->request('GET', $url);

                // Take screenshot for debugging
                if ($_ENV['APP_ENV'] === 'dev') {
                    $this->client->takeScreenshot('debug_' . time() . '.png');
                }

                return $this->extractData($crawler);

            } catch (\Exception $e) {
                $attempt++;
                $this->logger->warning('Scraping attempt failed', [
                    'url' => $url,
                    'attempt' => $attempt,
                    'error' => $e->getMessage()
                ]);

                if ($attempt >= $maxRetries) {
                    throw new \RuntimeException(
                        sprintf('Failed to scrape %s after %d attempts', $url, $maxRetries),
                        0,
                        $e
                    );
                }

                // Wait before retry
                sleep(2 ** $attempt); // Exponential backoff
            }
        }
    }
}

Performance Optimization

Connection Reuse

Optimize performance by reusing browser instances:

<?php

class OptimizedScrapingService
{
    private static ?Client $sharedClient = null;

    public static function getClient(): Client
    {
        if (self::$sharedClient === null) {
            self::$sharedClient = Client::createChromeClient(null, null, [
                'chromeArguments' => [
                    '--no-sandbox',
                    '--disable-dev-shm-usage',
                    '--headless',
                    '--disable-images', // Skip image loading for faster scraping
                    '--disable-javascript-execution', // Only if JS not needed
                ]
            ]);
        }

        return self::$sharedClient;
    }

    public function batchScrape(array $urls): array
    {
        $client = self::getClient();
        $results = [];

        foreach ($urls as $url) {
            try {
                $crawler = $client->request('GET', $url);
                $results[$url] = $this->extractData($crawler);
            } catch (\Exception $e) {
                $results[$url] = ['error' => $e->getMessage()];
            }
        }

        return $results;
    }
}

Handling Browser Sessions and Authentication

For scenarios requiring authentication, you can manage sessions similar to how you handle authentication in Puppeteer:

<?php

class AuthenticatedScrapingService
{
    private Client $client;

    public function loginAndScrape(string $loginUrl, string $username, string $password, string $targetUrl): array
    {
        // Navigate to login page
        $crawler = $this->client->request('GET', $loginUrl);

        // Fill login form
        $form = $crawler->selectButton('Login')->form([
            'username' => $username,
            'password' => $password
        ]);

        // Submit login form
        $this->client->submit($form);

        // Wait for redirect after login
        $this->client->waitFor('.user-dashboard');

        // Now scrape protected content
        $crawler = $this->client->request('GET', $targetUrl);

        return $this->extractProtectedData($crawler);
    }
}

Testing Best Practices

Create comprehensive tests that cover both positive and negative scenarios:

<?php

class PantherIntegrationTest extends PantherTestCase
{
    public function testCompleteUserFlow(): void
    {
        $client = static::createPantherClient();

        // Test navigation through multiple pages
        $crawler = $client->request('GET', '/');
        $link = $crawler->selectLink('Products')->link();
        $crawler = $client->click($link);

        // Test search functionality
        $form = $crawler->selectButton('Search')->form();
        $form['query'] = 'test product';
        $client->submit($form);

        // Verify search results
        $this->assertSelectorExists('.search-results');
        $this->assertGreaterThan(0, $crawler->filter('.product-item')->count());
    }

    public function testErrorHandling(): void
    {
        $client = static::createPantherClient();

        // Test 404 page
        $client->request('GET', '/non-existent-page');
        $this->assertSelectorTextContains('h1', '404');

        // Test form validation
        $crawler = $client->request('GET', '/contact');
        $form = $crawler->selectButton('Submit')->form();
        $client->submit($form); // Submit empty form

        $this->assertSelectorExists('.error-message');
    }
}

Production Deployment Considerations

When deploying Panther-based applications to production:

# Dockerfile example for production deployment
FROM php:8.1-fpm

# Install Chrome dependencies
RUN apt-get update && apt-get install -y \
    wget \
    gnupg \
    unzip \
    && wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | apt-key add - \
    && echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" > /etc/apt/sources.list.d/google-chrome.list \
    && apt-get update \
    && apt-get install -y google-chrome-stable

# Configure Chrome for container environment
ENV PANTHER_CHROME_ARGUMENTS='--no-sandbox,--disable-dev-shm-usage,--disable-gpu,--headless'

Conclusion

Integrating Symfony Panther with existing Symfony applications provides powerful capabilities for browser automation, testing, and web scraping. Whether you're building comprehensive test suites, creating automated data collection systems, or developing monitoring tools, Panther's seamless integration with Symfony's ecosystem makes it an excellent choice.

The key to successful integration lies in understanding your specific use case, implementing proper error handling, and optimizing for performance. With the examples and patterns provided above, you can build robust, maintainable solutions that leverage the full power of browser automation within your Symfony applications.

Remember to always respect robots.txt files and website terms of service when implementing web scraping functionality, and consider implementing rate limiting and respectful crawling practices to avoid overwhelming target servers.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon