GoQuery is a library in Go that brings a jQuery-like syntax for traversing and manipulating HTML documents. It is particularly useful when scraping web pages as it allows developers to easily select and manipulate elements within HTML documents.
Here are some of the methods available in GoQuery for filtering selections:
Find(selector string) *Selection
- This method is used to find elements that match the given selector within the current selection. It is similar to jQuery's
find
.
doc.Find(".classname").Each(func(i int, s *goquery.Selection) { // Iterate over each element with the class "classname" })
- This method is used to find elements that match the given selector within the current selection. It is similar to jQuery's
Filter(selector string) *Selection
- This method reduces the set of matched elements to those that match the given selector.
doc.Find("p").Filter(".intro").Each(func(i int, s *goquery.Selection) { // Iterate over each <p> element with the class "intro" })
Not(selector string) *Selection
- This method removes elements from the Selection that match the given selector.
doc.Find("p").Not(".exclude").Each(func(i int, s *goquery.Selection) { // Iterate over each <p> element that does not have class "exclude" })
Has(selector string) *Selection
- This method reduces the set of matched elements to those that have a descendant that matches the selector.
doc.Find("div").Has("p").Each(func(i int, s *goquery.Selection) { // Iterate over each <div> that contains a <p> element })
Eq(index int) *Selection
- This method reduces the set of matched elements to the one at the specified index.
paragraphs := doc.Find("p") thirdParagraph := paragraphs.Eq(2) // Zero-based index, so this is the third <p>
First() *Selection
andLast() *Selection
- These methods reduce the set of matched elements to the first or last one in the set, respectively.
firstDiv := doc.Find("div").First() lastDiv := doc.Find("div").Last()
Slice(start, end int) *Selection
- This method reduces the set of matched elements to a subset specified by a range of indices.
subset := doc.Find("p").Slice(1, 4) // Selects the second through fourth <p> elements
FilterFunction(f func(int, *Selection) bool) *Selection
- This method reduces the set of matched elements to those that pass the function's test.
filtered := doc.Find("p").FilterFunction(func(i int, s *goquery.Selection) bool { text := s.Text() return strings.Contains(text, "specific") })
Parent() *Selection
,Parents() *Selection
,Children() *Selection
,Siblings() *Selection
- These traversal methods allow you to navigate through the DOM tree relative to the current selection. They can also be combined with filter methods to refine the selection.
// Select all siblings of paragraphs that do not have the class "exclude" siblings := doc.Find("p").Siblings().Not(".exclude")
GoQuery provides a powerful set of methods for filtering and navigating through an HTML document, making it an invaluable tool for web scraping tasks in Go. Remember that most of these methods return a new *Selection
, allowing you to chain methods together in a fluent interface.