In Go, when you want to fetch HTML content from a web page to parse it with GoQuery, you first need to perform an HTTP request to get the HTML content as a string or a byte slice. After obtaining the HTML content, you can then use GoQuery to parse it and manipulate it as needed.
Below are the steps you need to follow to make HTTP requests and parse HTML content with GoQuery:
Step 1: Install GoQuery
If you haven't already, you need to install GoQuery. You can do this using go get
:
go get github.com/PuerkitoBio/goquery
Step 2: Make an HTTP Request
You can use Go's standard net/http
package to make an HTTP request:
package main
import (
"fmt"
"io/ioutil"
"net/http"
)
func main() {
// URL of the page to scrape
url := "http://example.com"
// Perform an HTTP GET request to the URL
resp, err := http.Get(url)
if err != nil {
fmt.Println("Error fetching URL:", err)
return
}
defer resp.Body.Close()
// Read the response body
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
fmt.Println("Error reading response body:", err)
return
}
// The variable `body` now contains the HTML content as a byte slice
// You can convert it to a string if needed:
htmlContent := string(body)
fmt.Println(htmlContent)
// Now you can pass `htmlContent` or `body` to GoQuery to parse it
}
Step 3: Parse HTML with GoQuery
Once you have the HTML content, you can use GoQuery to parse it and perform various operations like selecting elements, extracting text, and more.
package main
import (
"fmt"
"log"
"net/http"
"github.com/PuerkitoBio/goquery"
)
func main() {
// URL of the page to scrape
url := "http://example.com"
// Perform an HTTP GET request to the URL
resp, err := http.Get(url)
if err != nil {
log.Fatal("Error fetching URL:", err)
}
defer resp.Body.Close()
// Use GoQuery to parse the HTML
doc, err := goquery.NewDocumentFromReader(resp.Body)
if err != nil {
log.Fatal("Error loading HTTP response body:", err)
}
// Find and print all links
doc.Find("a").Each(func(index int, item *goquery.Selection) {
href, exists := item.Attr("href")
if exists {
fmt.Printf("Link #%d: %s\n", index, href)
}
})
}
In this example, we're fetching HTML from the specified URL and then using GoQuery to find and print all the links (<a>
tags) on the page.
Remember, web scraping should be done responsibly and ethically. Always check the website's robots.txt
file and terms of service to ensure you're allowed to scrape their content. It's also good practice to not overload their servers with too many requests in a short period of time.