Table of contents

How do I handle dynamic content loading in Go scraping?

Dynamic content loading is one of the most challenging aspects of web scraping, especially when dealing with modern web applications that heavily rely on JavaScript, AJAX requests, and single-page application (SPA) architectures. Unlike static HTML content that's immediately available in the page source, dynamic content is loaded asynchronously after the initial page load, making it invisible to traditional HTTP-based scrapers.

In Go, handling dynamic content requires using headless browsers or specialized tools that can execute JavaScript and wait for content to load. This article covers comprehensive techniques for scraping dynamic content effectively using Go.

Understanding Dynamic Content Loading

Dynamic content loading occurs when: - JavaScript modifies the DOM after page load - AJAX requests fetch data from APIs - Content loads based on user interactions (scrolling, clicking) - Single-page applications render content client-side - Infinite scroll or pagination loads content progressively

Traditional Go HTTP clients like net/http cannot handle this dynamic content because they only retrieve the initial HTML without executing JavaScript.

Method 1: Using ChromeDP for Headless Browser Automation

ChromeDP is the most popular Go library for controlling Chrome/Chromium browsers programmatically. It provides full browser automation capabilities, including JavaScript execution and dynamic content handling.

Installation and Basic Setup

go mod init dynamic-scraper
go get github.com/chromedp/chromedp

Basic Dynamic Content Scraping

package main

import (
    "context"
    "fmt"
    "log"
    "time"

    "github.com/chromedp/chromedp"
)

func main() {
    // Create context
    ctx, cancel := chromedp.NewContext(context.Background())
    defer cancel()

    // Set timeout
    ctx, cancel = context.WithTimeout(ctx, 30*time.Second)
    defer cancel()

    var content string

    err := chromedp.Run(ctx,
        // Navigate to page
        chromedp.Navigate("https://example.com/dynamic-page"),

        // Wait for dynamic content to load
        chromedp.WaitVisible("#dynamic-content", chromedp.ByID),

        // Extract the content
        chromedp.Text("#dynamic-content", &content, chromedp.ByID),
    )

    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Dynamic content: %s\n", content)
}

Advanced Waiting Strategies

Different types of dynamic content require different waiting strategies:

package main

import (
    "context"
    "errors"
    "fmt"
    "log"
    "time"

    "github.com/chromedp/chromedp"
    "github.com/chromedp/cdproto/cdp"
)

func scrapeWithMultipleWaitStrategies(url string) error {
    ctx, cancel := chromedp.NewContext(context.Background())
    defer cancel()

    ctx, cancel = context.WithTimeout(ctx, 60*time.Second)
    defer cancel()

    var results []string

    return chromedp.Run(ctx,
        chromedp.Navigate(url),

        // Strategy 1: Wait for specific element
        chromedp.WaitVisible("#ajax-content", chromedp.ByID),

        // Strategy 2: Wait for element count
        chromedp.WaitFunc(func(ctx context.Context, frame *cdp.Frame) error {
            var count int
            err := chromedp.Evaluate(`document.querySelectorAll('.item').length`, &count).Do(ctx)
            if err != nil {
                return err
            }
            if count >= 10 { // Wait for at least 10 items
                return nil
            }
            return errors.New("not enough items loaded")
        }),

        // Strategy 3: Wait for network idle
        chromedp.ActionFunc(func(ctx context.Context) error {
            // Wait for 2 seconds of network inactivity
            time.Sleep(2 * time.Second)
            return nil
        }),

        // Strategy 4: Wait for custom JavaScript condition
        chromedp.WaitFunc(func(ctx context.Context, frame *cdp.Frame) error {
            var ready bool
            err := chromedp.Evaluate(`window.dataLoaded === true`, &ready).Do(ctx)
            if err != nil {
                return err
            }
            if ready {
                return nil
            }
            return errors.New("data not ready")
        }),

        // Extract all results
        chromedp.Evaluate(`Array.from(document.querySelectorAll('.item')).map(el => el.textContent)`, &results),
    )
}

Method 2: Handling AJAX Requests and API Calls

Sometimes it's more efficient to intercept and replicate the AJAX requests that load dynamic content:

package main

import (
    "context"
    "encoding/json"
    "fmt"
    "log"
    "net/http"
    "time"

    "github.com/chromedp/chromedp"
    "github.com/chromedp/cdproto/network"
)

type APIResponse struct {
    Data []struct {
        ID    int    `json:"id"`
        Title string `json:"title"`
        Content string `json:"content"`
    } `json:"data"`
}

func interceptAJAXRequests(url string) error {
    ctx, cancel := chromedp.NewContext(context.Background())
    defer cancel()

    // Enable network events
    chromedp.ListenTarget(ctx, func(ev interface{}) {
        switch ev := ev.(type) {
        case *network.EventResponseReceived:
            if ev.Response.URL == "https://api.example.com/data" {
                fmt.Printf("Intercepted API call: %s\n", ev.Response.URL)

                // Get response body
                go func() {
                    body, err := network.GetResponseBody(ev.RequestID).Do(ctx)
                    if err != nil {
                        log.Printf("Error getting response body: %v", err)
                        return
                    }

                    var apiResp APIResponse
                    if err := json.Unmarshal(body, &apiResp); err != nil {
                        log.Printf("Error parsing JSON: %v", err)
                        return
                    }

                    fmt.Printf("Got %d items from API\n", len(apiResp.Data))
                }()
            }
        }
    })

    return chromedp.Run(ctx,
        network.Enable(),
        chromedp.Navigate(url),
        chromedp.Sleep(5*time.Second), // Wait for AJAX calls
    )
}

// Alternative: Direct API scraping
func scrapeAPIDirectly() (*APIResponse, error) {
    client := &http.Client{Timeout: 30 * time.Second}

    req, err := http.NewRequest("GET", "https://api.example.com/data", nil)
    if err != nil {
        return nil, err
    }

    // Add necessary headers
    req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36")
    req.Header.Set("Accept", "application/json")
    req.Header.Set("Referer", "https://example.com")

    resp, err := client.Do(req)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()

    var apiResp APIResponse
    if err := json.NewDecoder(resp.Body).Decode(&apiResp); err != nil {
        return nil, err
    }

    return &apiResp, nil
}

Method 3: Handling Infinite Scroll and Pagination

Many modern websites use infinite scroll to load content progressively. Here's how to handle it:

package main

import (
    "context"
    "fmt"
    "log"
    "time"

    "github.com/chromedp/chromedp"
)

func scrapeInfiniteScroll(url string, maxItems int) ([]string, error) {
    ctx, cancel := chromedp.NewContext(context.Background())
    defer cancel()

    ctx, cancel = context.WithTimeout(ctx, 5*time.Minute)
    defer cancel()

    var items []string
    lastCount := 0

    err := chromedp.Run(ctx,
        chromedp.Navigate(url),
        chromedp.WaitVisible(".item", chromedp.ByQuery),

        // Scroll and wait loop
        chromedp.ActionFunc(func(ctx context.Context) error {
            for {
                // Get current item count
                var currentCount int
                err := chromedp.Evaluate(`document.querySelectorAll('.item').length`, &currentCount).Do(ctx)
                if err != nil {
                    return err
                }

                fmt.Printf("Current items: %d\n", currentCount)

                // Check if we have enough items or no new items loaded
                if currentCount >= maxItems || (currentCount == lastCount && currentCount > 0) {
                    break
                }

                // Scroll to bottom
                err = chromedp.Evaluate(`window.scrollTo(0, document.body.scrollHeight)`, nil).Do(ctx)
                if err != nil {
                    return err
                }

                // Wait for new content to load
                time.Sleep(2 * time.Second)

                // Wait for loading indicator to disappear (if present)
                chromedp.WaitNotPresent(".loading", chromedp.ByQuery).Do(ctx)

                lastCount = currentCount
            }
            return nil
        }),

        // Extract all items
        chromedp.Evaluate(`Array.from(document.querySelectorAll('.item')).map(el => el.textContent.trim())`, &items),
    )

    return items, err
}

Method 4: Using Rod (Alternative to ChromeDP)

Rod is another excellent Go library for browser automation with a more intuitive API:

package main

import (
    "fmt"
    "log"
    "time"

    "github.com/go-rod/rod"
    "github.com/go-rod/rod/lib/launcher"
)

func scrapeWithRod(url string) error {
    // Launch browser
    l := launcher.New().Headless(true)
    defer l.Cleanup()

    browser := rod.New().ControlURL(l.MustLaunch()).MustConnect()
    defer browser.MustClose()

    page := browser.MustPage(url)

    // Wait for dynamic content
    page.MustWaitLoad()

    // Wait for specific element
    element := page.MustElement("#dynamic-content")

    // Wait for element to have content
    element.MustWaitFunc(func() bool {
        return len(element.MustText()) > 0
    })

    // Extract content
    content := element.MustText()
    fmt.Printf("Content: %s\n", content)

    // Handle multiple elements with retry logic
    items := page.MustElements(".item")
    for i, item := range items {
        // Wait for each item to be fully loaded
        item.MustWaitFunc(func() bool {
            return item.MustVisible()
        })

        text := item.MustText()
        fmt.Printf("Item %d: %s\n", i+1, text)
    }

    return nil
}

Best Practices and Performance Optimization

1. Implement Proper Error Handling and Retries

func scrapeWithRetry(url string, maxRetries int) error {
    for attempt := 0; attempt < maxRetries; attempt++ {
        ctx, cancel := chromedp.NewContext(context.Background())

        err := chromedp.Run(ctx,
            chromedp.Navigate(url),
            chromedp.WaitVisible("#content", chromedp.ByID),
        )

        cancel()

        if err == nil {
            return nil
        }

        log.Printf("Attempt %d failed: %v", attempt+1, err)
        if attempt < maxRetries-1 {
            time.Sleep(time.Duration(attempt+1) * time.Second)
        }
    }

    return fmt.Errorf("failed after %d attempts", maxRetries)
}

2. Resource Management and Cleanup

func scrapeWithProperCleanup(urls []string) error {
    ctx, cancel := chromedp.NewContext(context.Background())
    defer cancel()

    for _, url := range urls {
        // Create new tab for each URL
        newCtx, newCancel := chromedp.NewContext(ctx)

        err := chromedp.Run(newCtx,
            chromedp.Navigate(url),
            chromedp.WaitVisible("#content", chromedp.ByID),
            // Extract data...
        )

        newCancel() // Clean up tab

        if err != nil {
            log.Printf("Error scraping %s: %v", url, err)
            continue
        }
    }

    return nil
}

3. Performance Optimizations

func optimizedScraping() {
    opts := append(chromedp.DefaultExecAllocatorOptions[:],
        chromedp.DisableGPU,
        chromedp.NoDefaultBrowserCheck,
        chromedp.Flag("disable-background-timer-throttling", true),
        chromedp.Flag("disable-backgrounding-occluded-windows", true),
        chromedp.Flag("disable-renderer-backgrounding", true),
        chromedp.Flag("disable-extensions", true),
        chromedp.Flag("disable-plugins", true),
        chromedp.Flag("disable-default-apps", true),
        chromedp.Flag("disable-dev-shm-usage", true),
        chromedp.Flag("no-sandbox", true),
    )

    allocCtx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
    defer cancel()

    ctx, cancel := chromedp.NewContext(allocCtx)
    defer cancel()

    // Your scraping code here...
}

Debugging Dynamic Content Issues

When dynamic content doesn't load as expected, use these debugging techniques:

func debugDynamicContent(url string) {
    ctx, cancel := chromedp.NewContext(context.Background())
    defer cancel()

    chromedp.Run(ctx,
        chromedp.Navigate(url),

        // Take screenshot before waiting
        chromedp.Screenshot(`before.png`, chromedp.FullScreenshot),

        // Log console messages
        chromedp.ActionFunc(func(ctx context.Context) error {
            chromedp.Evaluate(`console.log('Checking for dynamic content...')`, nil).Do(ctx)
            return nil
        }),

        // Wait and take another screenshot
        chromedp.Sleep(5*time.Second),
        chromedp.Screenshot(`after.png`, chromedp.FullScreenshot),

        // Check what elements exist
        chromedp.ActionFunc(func(ctx context.Context) error {
            var elementCount int
            chromedp.Evaluate(`document.querySelectorAll('*').length`, &elementCount).Do(ctx)
            fmt.Printf("Total elements: %d\n", elementCount)
            return nil
        }),
    )
}

Working with WebSocket Connections

Some dynamic content loads through WebSocket connections. Here's how to handle them:

package main

import (
    "context"
    "log"
    "time"

    "github.com/chromedp/chromedp"
    "github.com/chromedp/cdproto/runtime"
)

func handleWebSocketContent(url string) error {
    ctx, cancel := chromedp.NewContext(context.Background())
    defer cancel()

    // Listen for console messages (WebSocket data often logged to console)
    chromedp.ListenTarget(ctx, func(ev interface{}) {
        switch ev := ev.(type) {
        case *runtime.EventConsoleAPICalled:
            log.Printf("Console: %s", ev.Args[0].Value)
        }
    })

    return chromedp.Run(ctx,
        runtime.Enable(),
        chromedp.Navigate(url),

        // Wait for WebSocket connection to establish
        chromedp.Sleep(3*time.Second),

        // Inject JavaScript to capture WebSocket messages
        chromedp.Evaluate(`
            const originalWebSocket = window.WebSocket;
            window.WebSocket = function(url, protocols) {
                const ws = new originalWebSocket(url, protocols);
                ws.addEventListener('message', function(event) {
                    console.log('WebSocket message:', event.data);
                    window.wsData = event.data;
                });
                return ws;
            };
        `, nil),

        // Wait for WebSocket data
        chromedp.WaitFunc(func(ctx context.Context, frame *cdp.Frame) error {
            var hasData bool
            err := chromedp.Evaluate(`window.wsData !== undefined`, &hasData).Do(ctx)
            if err != nil {
                return err
            }
            if hasData {
                return nil
            }
            return errors.New("WebSocket data not received")
        }),

        // Extract WebSocket data
        chromedp.ActionFunc(func(ctx context.Context) error {
            var wsData string
            err := chromedp.Evaluate(`window.wsData`, &wsData).Do(ctx)
            if err != nil {
                return err
            }
            log.Printf("WebSocket data: %s", wsData)
            return nil
        }),
    )
}

Handling Single Page Applications (SPAs)

SPAs require special consideration because they often load content after route changes:

func scrapeSPA(baseURL string, routes []string) error {
    ctx, cancel := chromedp.NewContext(context.Background())
    defer cancel()

    return chromedp.Run(ctx,
        chromedp.Navigate(baseURL),
        chromedp.WaitVisible("body", chromedp.ByQuery),

        chromedp.ActionFunc(func(ctx context.Context) error {
            for _, route := range routes {
                log.Printf("Navigating to route: %s", route)

                // Navigate to route (SPA navigation)
                err := chromedp.Evaluate(fmt.Sprintf(`
                    history.pushState({}, '', '%s');
                    window.dispatchEvent(new PopStateEvent('popstate'));
                `, route), nil).Do(ctx)
                if err != nil {
                    return err
                }

                // Wait for route-specific content
                chromedp.Sleep(2*time.Second).Do(ctx)

                // Wait for loading to complete
                chromedp.WaitFunc(func(ctx context.Context, frame *cdp.Frame) error {
                    var loading bool
                    err := chromedp.Evaluate(`document.querySelector('.loading') !== null`, &loading).Do(ctx)
                    if err != nil {
                        return err
                    }
                    if !loading {
                        return nil
                    }
                    return errors.New("still loading")
                }).Do(ctx)

                // Extract content for this route
                var content string
                chromedp.Evaluate(`document.body.innerText`, &content).Do(ctx)
                log.Printf("Content for %s: %s", route, content[:100]+"...")
            }
            return nil
        }),
    )
}

Integration with Go Concurrency

Leverage Go's concurrency features for efficient dynamic content scraping:

package main

import (
    "context"
    "log"
    "sync"
    "time"

    "github.com/chromedp/chromedp"
)

type ScrapeResult struct {
    URL     string
    Content string
    Error   error
}

func concurrentDynamicScraping(urls []string, workers int) []ScrapeResult {
    urlChan := make(chan string, len(urls))
    resultChan := make(chan ScrapeResult, len(urls))

    var wg sync.WaitGroup

    // Start workers
    for i := 0; i < workers; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()

            // Create browser context for this worker
            ctx, cancel := chromedp.NewContext(context.Background())
            defer cancel()

            for url := range urlChan {
                result := ScrapeResult{URL: url}

                err := chromedp.Run(ctx,
                    chromedp.Navigate(url),
                    chromedp.WaitVisible("#content", chromedp.ByID),
                    chromedp.Text("#content", &result.Content, chromedp.ByID),
                )

                result.Error = err
                resultChan <- result
            }
        }()
    }

    // Send URLs to workers
    for _, url := range urls {
        urlChan <- url
    }
    close(urlChan)

    // Close result channel when all workers done
    go func() {
        wg.Wait()
        close(resultChan)
    }()

    // Collect results
    var results []ScrapeResult
    for result := range resultChan {
        results = append(results, result)
        if result.Error != nil {
            log.Printf("Error scraping %s: %v", result.URL, result.Error)
        }
    }

    return results
}

Conclusion

Handling dynamic content in Go scraping requires understanding the nature of the content loading mechanism and choosing the appropriate strategy. While headless browsers like ChromeDP and Rod provide the most comprehensive solution, they come with performance overhead. For API-driven content, direct API scraping might be more efficient.

The key to successful dynamic content scraping is implementing proper waiting strategies, error handling, and resource management. When dealing with complex scenarios like infinite scroll or single-page applications that require JavaScript execution, patience and the right timing are crucial for reliable data extraction.

Consider implementing proper timeout handling and error management strategies when building production-ready scrapers. For simpler dynamic content scenarios, consider using WebScraping.AI's API, which handles JavaScript execution and dynamic content loading automatically, allowing you to focus on data extraction rather than browser automation complexity.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon