Table of contents

How do I optimize PHP web scraping for mobile-responsive websites?

Scraping mobile-responsive websites with PHP requires specialized techniques to handle dynamic layouts, device-specific content, and adaptive loading behaviors. This comprehensive guide covers advanced optimization strategies for efficiently extracting data from mobile-first websites.

Understanding Mobile-Responsive Challenges

Mobile-responsive websites present unique scraping challenges:

  • Dynamic CSS breakpoints that change layout based on viewport size
  • Lazy loading of content below the fold
  • Touch-optimized elements with different selectors
  • Reduced content on mobile versions
  • Progressive enhancement that loads content incrementally

Core PHP Setup for Mobile Scraping

Basic cURL Configuration

<?php
class MobileWebScraper {
    private $ch;
    private $userAgents;

    public function __construct() {
        $this->ch = curl_init();
        $this->userAgents = [
            'mobile' => [
                'Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15',
                'Mozilla/5.0 (Android 12; Mobile; rv:104.0) Gecko/104.0 Firefox/104.0',
                'Mozilla/5.0 (Linux; Android 12; SM-G991B) AppleWebKit/537.36'
            ],
            'tablet' => [
                'Mozilla/5.0 (iPad; CPU OS 16_0 like Mac OS X) AppleWebKit/605.1.15',
                'Mozilla/5.0 (Android 12; Tablet; rv:104.0) Gecko/104.0 Firefox/104.0'
            ]
        ];

        $this->setupCurl();
    }

    private function setupCurl() {
        curl_setopt_array($this->ch, [
            CURLOPT_RETURNTRANSFER => true,
            CURLOPT_FOLLOWLOCATION => true,
            CURLOPT_MAXREDIRS => 10,
            CURLOPT_TIMEOUT => 30,
            CURLOPT_SSL_VERIFYPEER => false,
            CURLOPT_ENCODING => '', // Accept all encodings
            CURLOPT_COOKIEJAR => tempnam(sys_get_temp_dir(), 'cookies'),
            CURLOPT_COOKIEFILE => tempnam(sys_get_temp_dir(), 'cookies')
        ]);
    }
}
?>

Dynamic User Agent Rotation

public function setMobileUserAgent($deviceType = 'mobile') {
    $agents = $this->userAgents[$deviceType] ?? $this->userAgents['mobile'];
    $randomAgent = $agents[array_rand($agents)];

    curl_setopt($this->ch, CURLOPT_USERAGENT, $randomAgent);

    // Set mobile-specific headers
    $mobileHeaders = [
        'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language: en-US,en;q=0.9',
        'Accept-Encoding: gzip, deflate, br',
        'DNT: 1',
        'Connection: keep-alive',
        'Upgrade-Insecure-Requests: 1'
    ];

    if ($deviceType === 'mobile') {
        $mobileHeaders[] = 'Viewport-Width: 375';
        $mobileHeaders[] = 'Width: 375';
    }

    curl_setopt($this->ch, CURLOPT_HTTPHEADER, $mobileHeaders);
}

Advanced Mobile Scraping Techniques

Viewport Simulation and Content Adaptation

class ResponsiveContentExtractor {
    private $scraper;
    private $viewports;

    public function __construct($scraper) {
        $this->scraper = $scraper;
        $this->viewports = [
            'mobile' => ['width' => 375, 'height' => 667],
            'tablet' => ['width' => 768, 'height' => 1024],
            'desktop' => ['width' => 1920, 'height' => 1080]
        ];
    }

    public function extractAdaptiveContent($url, $selectors) {
        $results = [];

        foreach ($this->viewports as $device => $viewport) {
            $this->scraper->setMobileUserAgent($device);
            $html = $this->scraper->fetchPage($url);

            if ($html) {
                $dom = new DOMDocument();
                libxml_use_internal_errors(true);
                $dom->loadHTML($html);
                $xpath = new DOMXPath($dom);

                $deviceData = [];
                foreach ($selectors as $key => $selector) {
                    $elements = $xpath->query($selector);
                    $deviceData[$key] = $this->extractElementData($elements);
                }

                $results[$device] = $deviceData;
            }

            // Add delay between requests
            usleep(500000); // 0.5 second delay
        }

        return $this->mergeResponsiveData($results);
    }

    private function mergeResponsiveData($results) {
        $merged = [];

        // Prioritize mobile data, fallback to tablet, then desktop
        $priority = ['mobile', 'tablet', 'desktop'];

        foreach ($priority as $device) {
            if (isset($results[$device])) {
                foreach ($results[$device] as $key => $value) {
                    if (!isset($merged[$key]) && !empty($value)) {
                        $merged[$key] = $value;
                        $merged[$key . '_source'] = $device;
                    }
                }
            }
        }

        return $merged;
    }
}

Handling Lazy Loading and Progressive Enhancement

class LazyContentHandler {
    private $scraper;

    public function __construct($scraper) {
        $this->scraper = $scraper;
    }

    public function handleLazyLoading($url, $lazySelectors = []) {
        // First pass: Get initial content
        $html = $this->scraper->fetchPage($url);
        $initialDom = new DOMDocument();
        libxml_use_internal_errors(true);
        $initialDom->loadHTML($html);

        // Look for lazy loading indicators
        $lazyImages = $this->findLazyImages($initialDom);
        $ajaxEndpoints = $this->findAjaxEndpoints($html);

        // Simulate scroll events by fetching additional content
        $additionalContent = [];
        foreach ($ajaxEndpoints as $endpoint) {
            $additionalData = $this->fetchAjaxContent($endpoint);
            if ($additionalData) {
                $additionalContent[] = $additionalData;
            }
        }

        return [
            'initial_content' => $html,
            'lazy_images' => $lazyImages,
            'additional_content' => $additionalContent
        ];
    }

    private function findLazyImages($dom) {
        $xpath = new DOMXPath($dom);
        $lazyImages = [];

        // Common lazy loading attributes
        $lazySelectors = [
            "//img[@data-src]",
            "//img[@data-lazy]",
            "//img[@loading='lazy']",
            "//img[contains(@class, 'lazy')]"
        ];

        foreach ($lazySelectors as $selector) {
            $elements = $xpath->query($selector);
            foreach ($elements as $img) {
                $src = $img->getAttribute('data-src') ?: 
                       $img->getAttribute('data-lazy') ?: 
                       $img->getAttribute('src');
                if ($src) {
                    $lazyImages[] = $src;
                }
            }
        }

        return array_unique($lazyImages);
    }

    private function findAjaxEndpoints($html) {
        $endpoints = [];

        // Extract AJAX endpoints from JavaScript
        preg_match_all('/fetch\([\'"]([^\'"]+)[\'"]/', $html, $fetchMatches);
        preg_match_all('/\$\.get\([\'"]([^\'"]+)[\'"]/', $html, $jqueryMatches);
        preg_match_all('/XMLHttpRequest.*open\([\'"]GET[\'"],\s*[\'"]([^\'"]+)/', $html, $xhrMatches);

        $endpoints = array_merge(
            $fetchMatches[1] ?? [],
            $jqueryMatches[1] ?? [],
            $xhrMatches[1] ?? []
        );

        return array_unique($endpoints);
    }
}

Performance Optimization Strategies

Intelligent Caching System

class MobileCacheManager {
    private $cacheDir;
    private $ttl;

    public function __construct($cacheDir = 'cache/', $ttl = 3600) {
        $this->cacheDir = rtrim($cacheDir, '/') . '/';
        $this->ttl = $ttl;

        if (!is_dir($this->cacheDir)) {
            mkdir($this->cacheDir, 0755, true);
        }
    }

    public function getCachedContent($url, $device, $selectors) {
        $cacheKey = md5($url . $device . serialize($selectors));
        $cacheFile = $this->cacheDir . $cacheKey . '.cache';

        if (file_exists($cacheFile) && 
            (time() - filemtime($cacheFile)) < $this->ttl) {
            return unserialize(file_get_contents($cacheFile));
        }

        return null;
    }

    public function setCachedContent($url, $device, $selectors, $content) {
        $cacheKey = md5($url . $device . serialize($selectors));
        $cacheFile = $this->cacheDir . $cacheKey . '.cache';

        file_put_contents($cacheFile, serialize($content));
    }

    public function clearExpiredCache() {
        $files = glob($this->cacheDir . '*.cache');
        $deleted = 0;

        foreach ($files as $file) {
            if ((time() - filemtime($file)) > $this->ttl) {
                unlink($file);
                $deleted++;
            }
        }

        return $deleted;
    }
}

Concurrent Request Processing

class ConcurrentMobileScraper {
    private $multiHandle;
    private $curlHandles = [];

    public function __construct() {
        $this->multiHandle = curl_multi_init();
    }

    public function addRequest($url, $deviceType, $options = []) {
        $ch = curl_init();
        $requestId = uniqid();

        curl_setopt_array($ch, [
            CURLOPT_URL => $url,
            CURLOPT_RETURNTRANSFER => true,
            CURLOPT_TIMEOUT => 30,
            CURLOPT_USERAGENT => $this->getMobileUserAgent($deviceType)
        ] + $options);

        curl_multi_add_handle($this->multiHandle, $ch);
        $this->curlHandles[$requestId] = $ch;

        return $requestId;
    }

    public function executeAll() {
        $running = null;

        do {
            curl_multi_exec($this->multiHandle, $running);
            curl_multi_select($this->multiHandle);
        } while ($running > 0);

        $results = [];
        foreach ($this->curlHandles as $id => $ch) {
            $results[$id] = [
                'content' => curl_multi_getcontent($ch),
                'info' => curl_getinfo($ch),
                'error' => curl_error($ch)
            ];

            curl_multi_remove_handle($this->multiHandle, $ch);
            curl_close($ch);
        }

        return $results;
    }

    private function getMobileUserAgent($deviceType) {
        $agents = [
            'mobile' => 'Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15',
            'tablet' => 'Mozilla/5.0 (iPad; CPU OS 16_0 like Mac OS X) AppleWebKit/605.1.15',
            'android' => 'Mozilla/5.0 (Linux; Android 12; SM-G991B) AppleWebKit/537.36'
        ];

        return $agents[$deviceType] ?? $agents['mobile'];
    }
}

Integration with Headless Browsers

For JavaScript-heavy mobile sites, consider integrating PHP with headless browsers. While PHP excels at static content extraction, combining it with tools like Puppeteer can handle complex mobile interactions and dynamic content that loads after page navigation.

PHP-Puppeteer Bridge

class PHPPuppeteerBridge {
    private $nodePath;
    private $scriptPath;

    public function __construct($nodePath = 'node', $scriptPath = 'puppeteer_mobile.js') {
        $this->nodePath = $nodePath;
        $this->scriptPath = $scriptPath;
    }

    public function scrapeMobilePage($url, $viewport = ['width' => 375, 'height' => 667]) {
        $command = sprintf(
            '%s %s "%s" %d %d',
            escapeshellcmd($this->nodePath),
            escapeshellarg($this->scriptPath),
            escapeshellarg($url),
            $viewport['width'],
            $viewport['height']
        );

        $output = shell_exec($command);
        return json_decode($output, true);
    }
}

Error Handling and Resilience

Robust Error Management

class MobileScrapingErrorHandler {
    private $maxRetries;
    private $retryDelay;

    public function __construct($maxRetries = 3, $retryDelay = 2) {
        $this->maxRetries = $maxRetries;
        $this->retryDelay = $retryDelay;
    }

    public function executeWithRetry($callback, $params = []) {
        $attempt = 0;
        $lastError = null;

        while ($attempt < $this->maxRetries) {
            try {
                return call_user_func_array($callback, $params);
            } catch (Exception $e) {
                $lastError = $e;
                $attempt++;

                if ($attempt < $this->maxRetries) {
                    sleep($this->retryDelay * $attempt); // Exponential backoff
                }
            }
        }

        throw new Exception("Failed after {$this->maxRetries} attempts: " . $lastError->getMessage());
    }

    public function handleHttpError($httpCode, $url) {
        switch ($httpCode) {
            case 429:
                // Rate limiting - implement longer delay
                sleep(10);
                return true; // Retry

            case 403:
                // Forbidden - might need different user agent
                return true; // Retry with different strategy

            case 404:
                // Not found - don't retry
                return false;

            default:
                return $httpCode >= 500; // Retry on server errors
        }
    }
}

Best Practices for Mobile Scraping

1. Respect Rate Limits

class RateLimiter {
    private $requests = [];
    private $maxRequests;
    private $timeWindow;

    public function __construct($maxRequests = 10, $timeWindow = 60) {
        $this->maxRequests = $maxRequests;
        $this->timeWindow = $timeWindow;
    }

    public function throttle() {
        $now = time();

        // Remove old requests outside the time window
        $this->requests = array_filter($this->requests, function($timestamp) use ($now) {
            return ($now - $timestamp) < $this->timeWindow;
        });

        if (count($this->requests) >= $this->maxRequests) {
            $oldestRequest = min($this->requests);
            $sleepTime = $this->timeWindow - ($now - $oldestRequest) + 1;
            sleep($sleepTime);
        }

        $this->requests[] = $now;
    }
}

2. Monitor and Adapt

class MobileScrapingMonitor {
    private $metrics = [];

    public function trackRequest($url, $device, $responseTime, $success) {
        $this->metrics[] = [
            'url' => $url,
            'device' => $device,
            'response_time' => $responseTime,
            'success' => $success,
            'timestamp' => time()
        ];
    }

    public function getPerformanceMetrics($device = null) {
        $filtered = $device ? 
            array_filter($this->metrics, fn($m) => $m['device'] === $device) : 
            $this->metrics;

        if (empty($filtered)) return null;

        $responseTimes = array_column($filtered, 'response_time');
        $successRate = array_sum(array_column($filtered, 'success')) / count($filtered);

        return [
            'avg_response_time' => array_sum($responseTimes) / count($responseTimes),
            'success_rate' => $successRate,
            'total_requests' => count($filtered)
        ];
    }
}

Optimizing for Different Mobile Frameworks

Many mobile-responsive sites use frameworks that require specific handling approaches. When working with single-page applications, you might need to handle AJAX requests and dynamic content loading more effectively.

Framework-Specific Optimizations

class FrameworkOptimizer {
    public function optimizeForFramework($html, $framework) {
        switch (strtolower($framework)) {
            case 'bootstrap':
                return $this->optimizeBootstrap($html);
            case 'foundation':
                return $this->optimizeFoundation($html);
            case 'materialize':
                return $this->optimizeMaterialize($html);
            default:
                return $this->genericOptimization($html);
        }
    }

    private function optimizeBootstrap($html) {
        // Bootstrap-specific mobile optimizations
        $dom = new DOMDocument();
        libxml_use_internal_errors(true);
        $dom->loadHTML($html);
        $xpath = new DOMXPath($dom);

        // Find Bootstrap mobile-specific classes
        $mobileElements = $xpath->query("//*[contains(@class, 'col-xs-') or contains(@class, 'd-block d-sm-none')]");

        return $this->extractElementsContent($mobileElements);
    }
}

Advanced Mobile Testing Strategies

Device-Specific Testing

class MobileTestSuite {
    private $scraper;
    private $testDevices;

    public function __construct($scraper) {
        $this->scraper = $scraper;
        $this->testDevices = [
            'iphone_13' => [
                'width' => 390,
                'height' => 844,
                'user_agent' => 'Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X) AppleWebKit/605.1.15'
            ],
            'samsung_galaxy' => [
                'width' => 360,
                'height' => 740,
                'user_agent' => 'Mozilla/5.0 (Linux; Android 11; SM-G991B) AppleWebKit/537.36'
            ],
            'ipad_pro' => [
                'width' => 1024,
                'height' => 1366,
                'user_agent' => 'Mozilla/5.0 (iPad; CPU OS 15_0 like Mac OS X) AppleWebKit/605.1.15'
            ]
        ];
    }

    public function runCrossDeviceTest($url, $selectors) {
        $results = [];

        foreach ($this->testDevices as $deviceName => $config) {
            $this->scraper->setUserAgent($config['user_agent']);
            $this->scraper->setViewport($config['width'], $config['height']);

            $startTime = microtime(true);
            $content = $this->scraper->fetchPage($url);
            $responseTime = microtime(true) - $startTime;

            if ($content) {
                $extractedData = $this->extractData($content, $selectors);
                $results[$deviceName] = [
                    'success' => true,
                    'response_time' => $responseTime,
                    'data' => $extractedData,
                    'content_length' => strlen($content)
                ];
            } else {
                $results[$deviceName] = [
                    'success' => false,
                    'response_time' => $responseTime,
                    'error' => 'Failed to fetch content'
                ];
            }
        }

        return $this->analyzeResults($results);
    }
}

Conclusion

Optimizing PHP web scraping for mobile-responsive websites requires a multi-faceted approach combining proper user agent management, viewport simulation, lazy loading handling, and performance optimization. By implementing these techniques and maintaining adaptive strategies, you can effectively extract data from modern mobile-first websites while respecting their architecture and rate limits.

Key takeaways: - Use device-specific user agents and headers - Implement intelligent caching and concurrent processing
- Handle lazy loading and progressive enhancement - Monitor performance and adapt strategies - Consider headless browser integration for complex sites

Remember to always respect robots.txt files, implement appropriate delays, and monitor your scraping activities to ensure sustainable and ethical data extraction practices.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon