GoQuery is a Go library that brings a syntax and feature set similar to jQuery to the Go language. It's primarily used for parsing and traversing HTML documents, making it a popular choice for web scraping tasks in Go programs.
GoQuery selectors are based on the CSS selector engine, which means they use the standard CSS selector syntax for matching elements in an HTML document. While GoQuery does not natively support regular expressions (regex) within its selectors, you can still leverage regex in Go by filtering elements after you have selected them with GoQuery's CSS-like selectors.
Here's an example to illustrate how you can combine GoQuery with Go's regex capabilities to filter elements based on a pattern:
package main
import (
"fmt"
"log"
"net/http"
"regexp"
"github.com/PuerkitoBio/goquery"
)
func main() {
// Example HTML document
html := `
<!DOCTYPE html>
<html>
<head>
<title>Web Scraping with GoQuery</title>
</head>
<body>
<div id="content">
<p data-custom="123">First paragraph with data-custom attribute.</p>
<p data-custom="abc">Second paragraph with data-custom attribute.</p>
<p>Third paragraph without data-custom attribute.</p>
</div>
</body>
</html>
`
// Create a new document from the HTML
doc, err := goquery.NewDocumentFromReader(strings.NewReader(html))
if err != nil {
log.Fatal(err)
}
// Define a regex pattern to match numeric values
re := regexp.MustCompile(`^\d+$`)
// Find all <p> elements and filter them with regex
doc.Find("p").Each(func(i int, s *goquery.Selection) {
// For each <p> element, get the value of the 'data-custom' attribute
dataCustom, exists := s.Attr("data-custom")
if exists && re.MatchString(dataCustom) {
// If the attribute exists and matches the regex pattern, print it
fmt.Printf("Found matching element: %s\n", s.Text())
}
})
}
In the example above, we first select all <p>
elements using GoQuery's Find
method. Then, we iterate over each element with the Each
method. Inside the loop, we retrieve the value of the data-custom
attribute and check if it exists and matches the regular expression pattern with re.MatchString(dataCustom)
. If it does, we print the text content of the matching element.
Please note that the example assumes that you have GoQuery installed (go get github.com/PuerkitoBio/goquery
).
In summary, while GoQuery selectors themselves do not support regex patterns, you can easily use Go's built-in regexp
package to apply regex to text or attribute contents of elements selected by GoQuery. This gives you the power to perform complex filtering based on patterns, even though the initial selection is done using CSS selectors.