How do I handle mobile-specific content and responsive layouts in Colly?

Handling mobile-specific content and responsive layouts in Colly requires understanding how websites serve different content to mobile devices and implementing strategies to simulate mobile browsing behavior. This guide covers the essential techniques for scraping mobile-optimized websites using Colly.

Understanding Mobile-Specific Content

Websites often serve different content or layouts based on the user's device. They typically use:

User-Agent detection to identify mobile browsers
CSS media queries for responsive design
JavaScript-based device detection
Separate mobile domains (e.g., m.example.com)
Progressive Web App (PWA) features

Setting Mobile User Agents

The most fundamental approach is setting a mobile User-Agent string to make your Colly scraper appear as a mobile browser:

package main

import (
    "fmt"
    "log"

    "github.com/gocolly/colly/v2"
    "github.com/gocolly/colly/v2/debug"
)

func main() {
    c := colly.NewCollector(
        colly.Debugger(&debug.LogDebugger{}),
    )

    // Set mobile User-Agent for iPhone
    c.UserAgent("Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1")

    // Alternative: Android User-Agent
    // c.UserAgent("Mozilla/5.0 (Linux; Android 11; SM-G991B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.120 Mobile Safari/537.36")

    c.OnHTML("title", func(e *colly.HTMLElement) {
        fmt.Printf("Page title: %s\n", e.Text)
    })

    c.OnRequest(func(r *colly.Request) {
        fmt.Printf("Visiting: %s\n", r.URL.String())
    })

    c.Visit("https://example.com")
}

Common Mobile User Agents

Here's a collection of popular mobile User-Agent strings:

var mobileUserAgents = []string{
    // iPhone
    "Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.0 Mobile/15E148 Safari/604.1",

    // Android Chrome
    "Mozilla/5.0 (Linux; Android 12; SM-G998B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36",

    // Samsung Internet
    "Mozilla/5.0 (Linux; Android 11; SAMSUNG SM-G973U) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/14.2 Chrome/87.0.4280.141 Mobile Safari/537.36",

    // iPad
    "Mozilla/5.0 (iPad; CPU OS 15_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.0 Mobile/15E148 Safari/604.1",
}

func getRandomMobileUserAgent() string {
    return mobileUserAgents[rand.Intn(len(mobileUserAgents))]
}

Handling Mobile-Specific Headers

Many websites check additional headers beyond User-Agent to determine mobile devices:

func setupMobileHeaders(c *colly.Collector) {
    c.OnRequest(func(r *colly.Request) {
        // Set mobile-specific headers
        r.Headers.Set("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
        r.Headers.Set("Accept-Language", "en-US,en;q=0.5")
        r.Headers.Set("Accept-Encoding", "gzip, deflate, br")
        r.Headers.Set("DNT", "1")
        r.Headers.Set("Connection", "keep-alive")
        r.Headers.Set("Upgrade-Insecure-Requests", "1")

        // Mobile-specific headers
        r.Headers.Set("Sec-Fetch-Dest", "document")
        r.Headers.Set("Sec-Fetch-Mode", "navigate")
        r.Headers.Set("Sec-Fetch-Site", "none")
        r.Headers.Set("Sec-Fetch-User", "?1")

        // Touch capability indicator
        r.Headers.Set("Touch", "true")
    })
}

Scraping Mobile-Only Content

Some websites serve completely different content structures for mobile devices. Here's how to handle this:

func scrapeMobileContent() {
    c := colly.NewCollector()

    // Set iPhone User-Agent
    c.UserAgent("Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1")

    // Handle mobile navigation menu
    c.OnHTML(".mobile-menu", func(e *colly.HTMLElement) {
        e.ForEach("a", func(i int, link *colly.HTMLElement) {
            href := link.Attr("href")
            text := link.Text
            fmt.Printf("Mobile menu item: %s -> %s\n", text, href)
        })
    })

    // Handle mobile-specific product cards
    c.OnHTML(".mobile-product-card", func(e *colly.HTMLElement) {
        name := e.ChildText(".product-name")
        price := e.ChildText(".product-price")
        image := e.ChildAttr("img", "src")

        fmt.Printf("Mobile Product: %s - %s (Image: %s)\n", name, price, image)
    })

    // Handle responsive images with srcset
    c.OnHTML("img[srcset]", func(e *colly.HTMLElement) {
        srcset := e.Attr("srcset")
        alt := e.Attr("alt")
        fmt.Printf("Responsive image: %s (srcset: %s)\n", alt, srcset)
    })

    c.Visit("https://example.com")
}

Handling AMP (Accelerated Mobile Pages)

Many websites serve AMP versions for mobile users. Here's how to detect and scrape AMP content:

func scrapeAMPContent() {
    c := colly.NewCollector()

    c.UserAgent("Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1")

    // Detect AMP pages
    c.OnHTML("html[amp], html[⚡]", func(e *colly.HTMLElement) {
        fmt.Println("This is an AMP page")

        // Extract AMP-specific components
        e.ForEach("amp-img", func(i int, img *colly.HTMLElement) {
            src := img.Attr("src")
            width := img.Attr("width")
            height := img.Attr("height")
            fmt.Printf("AMP Image: %s (%sx%s)\n", src, width, height)
        })

        // Handle AMP carousel
        e.ForEach("amp-carousel", func(i int, carousel *colly.HTMLElement) {
            carousel.ForEach("amp-img", func(j int, slide *colly.HTMLElement) {
                fmt.Printf("Carousel slide %d: %s\n", j, slide.Attr("src"))
            })
        })
    })

    // Look for AMP link in desktop version
    c.OnHTML("link[rel='amphtml']", func(e *colly.HTMLElement) {
        ampURL := e.Attr("href")
        fmt.Printf("Found AMP version: %s\n", ampURL)
        c.Visit(ampURL)
    })

    c.Visit("https://example.com/article")
}

Rotating Between Mobile and Desktop

For comprehensive scraping, you might want to compare mobile and desktop content:

type ScrapingResult struct {
    URL     string
    Mobile  map[string]string
    Desktop map[string]string
}

func compareMobileDesktop(url string) *ScrapingResult {
    result := &ScrapingResult{
        URL:     url,
        Mobile:  make(map[string]string),
        Desktop: make(map[string]string),
    }

    // Mobile scraper
    mobileCollector := colly.NewCollector()
    mobileCollector.UserAgent("Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1")

    mobileCollector.OnHTML("title", func(e *colly.HTMLElement) {
        result.Mobile["title"] = e.Text
    })

    mobileCollector.OnHTML(".mobile-content", func(e *colly.HTMLElement) {
        result.Mobile["content"] = e.Text
    })

    // Desktop scraper
    desktopCollector := colly.NewCollector()
    desktopCollector.UserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36")

    desktopCollector.OnHTML("title", func(e *colly.HTMLElement) {
        result.Desktop["title"] = e.Text
    })

    desktopCollector.OnHTML(".desktop-content", func(e *colly.HTMLElement) {
        result.Desktop["content"] = e.Text
    })

    // Visit with both collectors
    mobileCollector.Visit(url)
    desktopCollector.Visit(url)

    return result
}

Handling Mobile Redirects

Mobile websites often redirect to dedicated mobile domains:

func handleMobileRedirects() {
    c := colly.NewCollector(
        colly.AllowURLRevisit(),
    )

    c.UserAgent("Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1")

    // Track redirects
    c.OnResponse(func(r *colly.Response) {
        if r.StatusCode >= 300 && r.StatusCode < 400 {
            location := r.Headers.Get("Location")
            fmt.Printf("Redirect from %s to %s\n", r.Request.URL, location)
        }
    })

    // Handle common mobile subdomains
    c.OnHTML("link[rel='alternate'][media*='handheld'], link[rel='alternate'][href*='m.']", func(e *colly.HTMLElement) {
        mobileURL := e.Attr("href")
        fmt.Printf("Found mobile version: %s\n", mobileURL)
        c.Visit(mobileURL)
    })

    c.Visit("https://example.com")
}

Advanced Mobile Simulation

For more sophisticated mobile simulation, you can implement viewport and touch event simulation:

func advancedMobileSimulation() {
    c := colly.NewCollector()

    // Set comprehensive mobile headers
    c.OnRequest(func(r *colly.Request) {
        r.Headers.Set("User-Agent", "Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1")
        r.Headers.Set("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
        r.Headers.Set("Accept-Language", "en-US,en;q=0.9")
        r.Headers.Set("Accept-Encoding", "gzip, deflate, br")

        // Mobile-specific viewport simulation
        r.Headers.Set("Viewport-Width", "375")
        r.Headers.Set("Width", "375")
        r.Headers.Set("DPR", "2")

        // Touch capability
        r.Headers.Set("Touch", "true")
        r.Headers.Set("Mobile", "true")
    })

    // Handle responsive images
    c.OnHTML("picture source[media]", func(e *colly.HTMLElement) {
        media := e.Attr("media")
        srcset := e.Attr("srcset")
        fmt.Printf("Responsive source: %s -> %s\n", media, srcset)
    })

    c.Visit("https://example.com")
}

Best Practices and Considerations

Performance Optimization

When scraping mobile content, consider these performance tips:

func optimizedMobileScraper() {
    c := colly.NewCollector()

    // Limit concurrent requests for mobile
    c.Limit(&colly.LimitRule{
        DomainGlob:  "*",
        Parallelism: 2, // Lower for mobile to avoid overwhelming servers
        Delay:       2 * time.Second,
    })

    // Set appropriate timeouts for mobile networks
    c.SetRequestTimeout(30 * time.Second)

    // Handle mobile-specific errors
    c.OnError(func(r *colly.Response, err error) {
        if r.StatusCode == 429 {
            fmt.Println("Rate limited - implement backoff")
            time.Sleep(10 * time.Second)
        }
    })
}

Testing Mobile Scraping

Always test your mobile scraping setup:

# Test with curl to verify mobile response
curl -H "User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1" \
     -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" \
     "https://example.com"

# Compare mobile vs desktop response sizes
curl -s -H "User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X)" "https://example.com" | wc -c
curl -s -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)" "https://example.com" | wc -c

Limitations and Alternatives

While Colly is excellent for basic mobile content scraping, it has limitations with JavaScript-heavy mobile sites. For complex mobile applications that require JavaScript execution, consider using browser automation tools like Puppeteer which can simulate mobile devices more accurately.

Conclusion

Handling mobile-specific content in Colly requires careful attention to User-Agent strings, headers, and content structure differences. By implementing proper mobile simulation techniques and understanding how responsive websites work, you can effectively scrape mobile-optimized content. Remember to always respect robots.txt files and implement appropriate delays to avoid overwhelming mobile-optimized servers.

For JavaScript-heavy mobile applications, consider combining Colly with headless browsers or exploring specialized mobile scraping solutions that can handle complex mobile interactions.

Table of contents

How do I handle mobile-specific content and responsive layouts in Colly?

Understanding Mobile-Specific Content

Setting Mobile User Agents

Common Mobile User Agents

Handling Mobile-Specific Headers

Scraping Mobile-Only Content

Handling AMP (Accelerated Mobile Pages)

Rotating Between Mobile and Desktop

Handling Mobile Redirects

Advanced Mobile Simulation

Best Practices and Considerations

Performance Optimization

Testing Mobile Scraping

Limitations and Alternatives

Conclusion

Try WebScraping.AI for Your Web Scraping Needs

Key Features:

Getting Started:

Related Questions

Can Colly be used for competitive intelligence and price monitoring?

Get Started Now

Support