What is the best way to iterate over selected elements with GoQuery?

GoQuery is a library for Go (golang) that provides a set of utilities for scraping and manipulating HTML documents, much like jQuery does for JavaScript. To iterate over selected elements with GoQuery, you would typically use the Each or EachWithBreak functions. These functions allow you to apply a function to every element in the selection.

Here's an example on how to use Each to iterate over selected elements:

package main

import (
    "fmt"
    "log"
    "net/http"

    "github.com/PuerkitoBio/goquery"
)

func main() {
    // Make a request to the website
    res, err := http.Get("http://example.com")
    if err != nil {
        log.Fatal(err)
    }
    defer res.Body.Close()

    // Check the response status code
    if res.StatusCode != 200 {
        log.Fatalf("Status code error: %d %s", res.StatusCode, res.Status)
    }

    // Parse the HTML document
    doc, err := goquery.NewDocumentFromReader(res.Body)
    if err != nil {
        log.Fatal(err)
    }

    // Find and iterate over all elements matching a specific selector
    doc.Find("a").Each(func(index int, item *goquery.Selection) {
        href, exists := item.Attr("href")
        if exists {
            fmt.Printf("Link %d: %s\n", index, href)
        }
    })
}

In this example, Find("a") is used to select all anchor (<a>) elements in the document. The Each function takes a callback function that is executed for each element in the selection. The callback function receives the index of the current element in the iteration and the element itself as a *goquery.Selection object, which can be used to extract further information, like attributes or text content.

If you need to break out of the iteration early, you can use EachWithBreak instead. It works similarly to Each, but the callback function should return a boolean indicating whether to continue the iteration (true) or to break out of it (false).

Here's the same example as above, but using EachWithBreak to stop the iteration when a specific condition is met:

// ...

// Find and iterate over all elements matching a specific selector
doc.Find("a").EachWithBreak(func(index int, item *goquery.Selection) bool {
    href, exists := item.Attr("href")
    if exists {
        fmt.Printf("Link %d: %s\n", index, href)
    }
    // Continue iterating only if the href is not empty
    return href != ""
})

In this case, the iteration will stop as soon as an anchor element with an empty href attribute is encountered.

Remember to handle network requests, parsing, and iteration carefully to respect the terms of service and robots.txt of the target website, and to not overload the website with rapid or concurrent requests.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon