GoQuery is a library for Go (Golang) that provides a set of features for web scraping, similar to jQuery. If you are familiar with jQuery, you will find GoQuery intuitive and easy to use for tasks like scraping meta tags from HTML documents.
To scrape meta tags using GoQuery, you'll first need to install the library (if you haven't already) and then write a Go program that fetches the web page, parses the HTML, and queries the meta
tags.
Here's how to install GoQuery:
go get github.com/PuerkitoBio/goquery
And here is a simple Go program to scrape meta tags from an HTML document:
package main
import (
"fmt"
"log"
"net/http"
"github.com/PuerkitoBio/goquery"
)
func ScrapeMetaTags(url string) {
// Request the HTML page.
res, err := http.Get(url)
if err != nil {
log.Fatal(err)
}
defer res.Body.Close()
if res.StatusCode != 200 {
log.Fatalf("status code error: %d %s", res.StatusCode, res.Status)
}
// Load the HTML document
doc, err := goquery.NewDocumentFromReader(res.Body)
if err != nil {
log.Fatal(err)
}
// Find and iterate over each meta tag
doc.Find("meta").Each(func(i int, s *goquery.Selection) {
// For each meta tag, get the name (or property) and content attributes
if name, exists := s.Attr("name"); exists {
fmt.Printf("Meta tag #%d: %s - %s\n", i, name, s.AttrOr("content", ""))
} else if property, exists := s.Attr("property"); exists {
fmt.Printf("Meta tag #%d: %s - %s\n", i, property, s.AttrOr("content", ""))
}
})
}
func main() {
ScrapeMetaTags("http://example.com")
}
In the code above, we perform the following steps:
- We send a GET request to the specified URL using the
http
package. - We check the response status and handle any errors.
- We parse the HTML document with
goquery.NewDocumentFromReader
. - We use the
Find
method to select allmeta
tags and iterate over them usingEach
. - For each
meta
tag, we print out thename
orproperty
attribute and its correspondingcontent
attribute.
Remember to replace http://example.com
with the URL from which you want to scrape the meta tags. Also, ensure that web scraping is permitted on the target website and that you comply with its robots.txt
file and terms of service.