Table of contents

How do I handle mobile-specific content and responsive layouts in Colly?

Handling mobile-specific content and responsive layouts in Colly requires understanding how websites serve different content to mobile devices and implementing strategies to simulate mobile browsing behavior. This guide covers the essential techniques for scraping mobile-optimized websites using Colly.

Understanding Mobile-Specific Content

Websites often serve different content or layouts based on the user's device. They typically use:

  • User-Agent detection to identify mobile browsers
  • CSS media queries for responsive design
  • JavaScript-based device detection
  • Separate mobile domains (e.g., m.example.com)
  • Progressive Web App (PWA) features

Setting Mobile User Agents

The most fundamental approach is setting a mobile User-Agent string to make your Colly scraper appear as a mobile browser:

package main

import (
    "fmt"
    "log"

    "github.com/gocolly/colly/v2"
    "github.com/gocolly/colly/v2/debug"
)

func main() {
    c := colly.NewCollector(
        colly.Debugger(&debug.LogDebugger{}),
    )

    // Set mobile User-Agent for iPhone
    c.UserAgent("Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1")

    // Alternative: Android User-Agent
    // c.UserAgent("Mozilla/5.0 (Linux; Android 11; SM-G991B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.120 Mobile Safari/537.36")

    c.OnHTML("title", func(e *colly.HTMLElement) {
        fmt.Printf("Page title: %s\n", e.Text)
    })

    c.OnRequest(func(r *colly.Request) {
        fmt.Printf("Visiting: %s\n", r.URL.String())
    })

    c.Visit("https://example.com")
}

Common Mobile User Agents

Here's a collection of popular mobile User-Agent strings:

var mobileUserAgents = []string{
    // iPhone
    "Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.0 Mobile/15E148 Safari/604.1",

    // Android Chrome
    "Mozilla/5.0 (Linux; Android 12; SM-G998B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36",

    // Samsung Internet
    "Mozilla/5.0 (Linux; Android 11; SAMSUNG SM-G973U) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/14.2 Chrome/87.0.4280.141 Mobile Safari/537.36",

    // iPad
    "Mozilla/5.0 (iPad; CPU OS 15_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.0 Mobile/15E148 Safari/604.1",
}

func getRandomMobileUserAgent() string {
    return mobileUserAgents[rand.Intn(len(mobileUserAgents))]
}

Handling Mobile-Specific Headers

Many websites check additional headers beyond User-Agent to determine mobile devices:

func setupMobileHeaders(c *colly.Collector) {
    c.OnRequest(func(r *colly.Request) {
        // Set mobile-specific headers
        r.Headers.Set("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
        r.Headers.Set("Accept-Language", "en-US,en;q=0.5")
        r.Headers.Set("Accept-Encoding", "gzip, deflate, br")
        r.Headers.Set("DNT", "1")
        r.Headers.Set("Connection", "keep-alive")
        r.Headers.Set("Upgrade-Insecure-Requests", "1")

        // Mobile-specific headers
        r.Headers.Set("Sec-Fetch-Dest", "document")
        r.Headers.Set("Sec-Fetch-Mode", "navigate")
        r.Headers.Set("Sec-Fetch-Site", "none")
        r.Headers.Set("Sec-Fetch-User", "?1")

        // Touch capability indicator
        r.Headers.Set("Touch", "true")
    })
}

Scraping Mobile-Only Content

Some websites serve completely different content structures for mobile devices. Here's how to handle this:

func scrapeMobileContent() {
    c := colly.NewCollector()

    // Set iPhone User-Agent
    c.UserAgent("Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1")

    // Handle mobile navigation menu
    c.OnHTML(".mobile-menu", func(e *colly.HTMLElement) {
        e.ForEach("a", func(i int, link *colly.HTMLElement) {
            href := link.Attr("href")
            text := link.Text
            fmt.Printf("Mobile menu item: %s -> %s\n", text, href)
        })
    })

    // Handle mobile-specific product cards
    c.OnHTML(".mobile-product-card", func(e *colly.HTMLElement) {
        name := e.ChildText(".product-name")
        price := e.ChildText(".product-price")
        image := e.ChildAttr("img", "src")

        fmt.Printf("Mobile Product: %s - %s (Image: %s)\n", name, price, image)
    })

    // Handle responsive images with srcset
    c.OnHTML("img[srcset]", func(e *colly.HTMLElement) {
        srcset := e.Attr("srcset")
        alt := e.Attr("alt")
        fmt.Printf("Responsive image: %s (srcset: %s)\n", alt, srcset)
    })

    c.Visit("https://example.com")
}

Handling AMP (Accelerated Mobile Pages)

Many websites serve AMP versions for mobile users. Here's how to detect and scrape AMP content:

func scrapeAMPContent() {
    c := colly.NewCollector()

    c.UserAgent("Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1")

    // Detect AMP pages
    c.OnHTML("html[amp], html[⚡]", func(e *colly.HTMLElement) {
        fmt.Println("This is an AMP page")

        // Extract AMP-specific components
        e.ForEach("amp-img", func(i int, img *colly.HTMLElement) {
            src := img.Attr("src")
            width := img.Attr("width")
            height := img.Attr("height")
            fmt.Printf("AMP Image: %s (%sx%s)\n", src, width, height)
        })

        // Handle AMP carousel
        e.ForEach("amp-carousel", func(i int, carousel *colly.HTMLElement) {
            carousel.ForEach("amp-img", func(j int, slide *colly.HTMLElement) {
                fmt.Printf("Carousel slide %d: %s\n", j, slide.Attr("src"))
            })
        })
    })

    // Look for AMP link in desktop version
    c.OnHTML("link[rel='amphtml']", func(e *colly.HTMLElement) {
        ampURL := e.Attr("href")
        fmt.Printf("Found AMP version: %s\n", ampURL)
        c.Visit(ampURL)
    })

    c.Visit("https://example.com/article")
}

Rotating Between Mobile and Desktop

For comprehensive scraping, you might want to compare mobile and desktop content:

type ScrapingResult struct {
    URL     string
    Mobile  map[string]string
    Desktop map[string]string
}

func compareMobileDesktop(url string) *ScrapingResult {
    result := &ScrapingResult{
        URL:     url,
        Mobile:  make(map[string]string),
        Desktop: make(map[string]string),
    }

    // Mobile scraper
    mobileCollector := colly.NewCollector()
    mobileCollector.UserAgent("Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1")

    mobileCollector.OnHTML("title", func(e *colly.HTMLElement) {
        result.Mobile["title"] = e.Text
    })

    mobileCollector.OnHTML(".mobile-content", func(e *colly.HTMLElement) {
        result.Mobile["content"] = e.Text
    })

    // Desktop scraper
    desktopCollector := colly.NewCollector()
    desktopCollector.UserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36")

    desktopCollector.OnHTML("title", func(e *colly.HTMLElement) {
        result.Desktop["title"] = e.Text
    })

    desktopCollector.OnHTML(".desktop-content", func(e *colly.HTMLElement) {
        result.Desktop["content"] = e.Text
    })

    // Visit with both collectors
    mobileCollector.Visit(url)
    desktopCollector.Visit(url)

    return result
}

Handling Mobile Redirects

Mobile websites often redirect to dedicated mobile domains:

func handleMobileRedirects() {
    c := colly.NewCollector(
        colly.AllowURLRevisit(),
    )

    c.UserAgent("Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1")

    // Track redirects
    c.OnResponse(func(r *colly.Response) {
        if r.StatusCode >= 300 && r.StatusCode < 400 {
            location := r.Headers.Get("Location")
            fmt.Printf("Redirect from %s to %s\n", r.Request.URL, location)
        }
    })

    // Handle common mobile subdomains
    c.OnHTML("link[rel='alternate'][media*='handheld'], link[rel='alternate'][href*='m.']", func(e *colly.HTMLElement) {
        mobileURL := e.Attr("href")
        fmt.Printf("Found mobile version: %s\n", mobileURL)
        c.Visit(mobileURL)
    })

    c.Visit("https://example.com")
}

Advanced Mobile Simulation

For more sophisticated mobile simulation, you can implement viewport and touch event simulation:

func advancedMobileSimulation() {
    c := colly.NewCollector()

    // Set comprehensive mobile headers
    c.OnRequest(func(r *colly.Request) {
        r.Headers.Set("User-Agent", "Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1")
        r.Headers.Set("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
        r.Headers.Set("Accept-Language", "en-US,en;q=0.9")
        r.Headers.Set("Accept-Encoding", "gzip, deflate, br")

        // Mobile-specific viewport simulation
        r.Headers.Set("Viewport-Width", "375")
        r.Headers.Set("Width", "375")
        r.Headers.Set("DPR", "2")

        // Touch capability
        r.Headers.Set("Touch", "true")
        r.Headers.Set("Mobile", "true")
    })

    // Handle responsive images
    c.OnHTML("picture source[media]", func(e *colly.HTMLElement) {
        media := e.Attr("media")
        srcset := e.Attr("srcset")
        fmt.Printf("Responsive source: %s -> %s\n", media, srcset)
    })

    c.Visit("https://example.com")
}

Best Practices and Considerations

Performance Optimization

When scraping mobile content, consider these performance tips:

func optimizedMobileScraper() {
    c := colly.NewCollector()

    // Limit concurrent requests for mobile
    c.Limit(&colly.LimitRule{
        DomainGlob:  "*",
        Parallelism: 2, // Lower for mobile to avoid overwhelming servers
        Delay:       2 * time.Second,
    })

    // Set appropriate timeouts for mobile networks
    c.SetRequestTimeout(30 * time.Second)

    // Handle mobile-specific errors
    c.OnError(func(r *colly.Response, err error) {
        if r.StatusCode == 429 {
            fmt.Println("Rate limited - implement backoff")
            time.Sleep(10 * time.Second)
        }
    })
}

Testing Mobile Scraping

Always test your mobile scraping setup:

# Test with curl to verify mobile response
curl -H "User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1" \
     -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" \
     "https://example.com"

# Compare mobile vs desktop response sizes
curl -s -H "User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X)" "https://example.com" | wc -c
curl -s -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)" "https://example.com" | wc -c

Limitations and Alternatives

While Colly is excellent for basic mobile content scraping, it has limitations with JavaScript-heavy mobile sites. For complex mobile applications that require JavaScript execution, consider using browser automation tools like Puppeteer which can simulate mobile devices more accurately.

Conclusion

Handling mobile-specific content in Colly requires careful attention to User-Agent strings, headers, and content structure differences. By implementing proper mobile simulation techniques and understanding how responsive websites work, you can effectively scrape mobile-optimized content. Remember to always respect robots.txt files and implement appropriate delays to avoid overwhelming mobile-optimized servers.

For JavaScript-heavy mobile applications, consider combining Colly with headless browsers or exploring specialized mobile scraping solutions that can handle complex mobile interactions.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon