How do I scrape meta tags using GoQuery?

GoQuery is a library for Go (Golang) that provides a set of features for web scraping, similar to jQuery. If you are familiar with jQuery, you will find GoQuery intuitive and easy to use for tasks like scraping meta tags from HTML documents.

To scrape meta tags using GoQuery, you'll first need to install the library (if you haven't already) and then write a Go program that fetches the web page, parses the HTML, and queries the meta tags.

Here's how to install GoQuery:

go get github.com/PuerkitoBio/goquery

And here is a simple Go program to scrape meta tags from an HTML document:

package main

import (
    "fmt"
    "log"
    "net/http"

    "github.com/PuerkitoBio/goquery"
)

func ScrapeMetaTags(url string) {
    // Request the HTML page.
    res, err := http.Get(url)
    if err != nil {
        log.Fatal(err)
    }
    defer res.Body.Close()
    if res.StatusCode != 200 {
        log.Fatalf("status code error: %d %s", res.StatusCode, res.Status)
    }

    // Load the HTML document
    doc, err := goquery.NewDocumentFromReader(res.Body)
    if err != nil {
        log.Fatal(err)
    }

    // Find and iterate over each meta tag
    doc.Find("meta").Each(func(i int, s *goquery.Selection) {
        // For each meta tag, get the name (or property) and content attributes
        if name, exists := s.Attr("name"); exists {
            fmt.Printf("Meta tag #%d: %s - %s\n", i, name, s.AttrOr("content", ""))
        } else if property, exists := s.Attr("property"); exists {
            fmt.Printf("Meta tag #%d: %s - %s\n", i, property, s.AttrOr("content", ""))
        }
    })
}

func main() {
    ScrapeMetaTags("http://example.com")
}

In the code above, we perform the following steps:

  1. We send a GET request to the specified URL using the http package.
  2. We check the response status and handle any errors.
  3. We parse the HTML document with goquery.NewDocumentFromReader.
  4. We use the Find method to select all meta tags and iterate over them using Each.
  5. For each meta tag, we print out the name or property attribute and its corresponding content attribute.

Remember to replace http://example.com with the URL from which you want to scrape the meta tags. Also, ensure that web scraping is permitted on the target website and that you comply with its robots.txt file and terms of service.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon