GoQuery is a library in Go (Golang) that brings a syntax and a set of features similar to jQuery to the Go language. It is primarily used for web scraping, allowing developers to navigate and manipulate HTML documents with ease. GoQuery implements a subset of jQuery's selectors and functions, making it familiar to those who have used jQuery before.
Here are some of the basic selectors available in GoQuery:
- Element Selector: Selects all elements with the given tag name.
doc.Find("div")
- ID Selector: Selects a single element with the specified ID attribute.
doc.Find("#my-id")
- Class Selector: Selects all elements that have the specified class attribute.
doc.Find(".my-class")
- Attribute Selector: Selects elements that have a certain attribute, with optional value checks.
doc.Find("[href]")
doc.Find("[name='button']")
doc.Find("[data-role='page']")
doc.Find("[href^='https']") // Attribute starts with
doc.Find("[href$='.com']") // Attribute ends with
doc.Find("[href*='google']") // Attribute contains substring
- Descendant Selector: Selects elements that are descendants of a specified element.
doc.Find("div span")
- Child Selector: Selects all direct child elements specified by "parent > child".
doc.Find("ul > li")
Sibling Selectors: Selects sibling elements.
- Adjacent sibling selector: selects the immediately following sibling.
doc.Find("h2 + p")
- General sibling selector: selects all siblings following the element.
doc.Find("h2 ~ p")
First/Last/Even/Odd/Nth Child Selector: Selects elements that are the first, last, or at a specific position among their siblings.
doc.Find("li:first-child")
doc.Find("li:last-child")
doc.Find("tr:even")
doc.Find("tr:odd")
doc.Find("li:nth-child(2)")
- Input Selectors: Selects input elements by type.
doc.Find(":text")
doc.Find(":checkbox")
doc.Find(":checked")
GoQuery also supports chaining of selectors, enabling more complex queries.
doc.Find("div").Find(".my-class")
When using GoQuery, you first need to load the HTML document, which can be done from a string, a file, or a response from an HTTP request. Here's an example of how to use GoQuery to select elements with a certain class from a string containing HTML:
package main
import (
"fmt"
"log"
"strings"
"github.com/PuerkitoBio/goquery"
)
func main() {
html := `<html>
<head><title>GoQuery Rocks</title></head>
<body>
<div class="article">This is the first article</div>
<div class="article">This is the second article</div>
</body>
</html>`
doc, err := goquery.NewDocumentFromReader(strings.NewReader(html))
if err != nil {
log.Fatal(err)
}
doc.Find(".article").Each(func(i int, s *goquery.Selection) {
fmt.Println(s.Text())
})
}
The Each
function is used to iterate over the selected elements, and it takes a function that is applied to each element in the selection. In this function, you can manipulate or retrieve information from the elements as needed.
GoQuery provides a powerful and expressive API for querying and manipulating an HTML document in Go. It's widely used for web scraping, testing, or any scenario where you need to extract information from HTML.