GoQuery is a library for Go that allows you to scrape and manipulate HTML documents in a manner similar to jQuery. While GoQuery itself does not directly handle downloading of binary data such as images, it can be used to parse HTML and extract image URLs, which you can then download using Go's HTTP client.
Here's a step-by-step guide on how to scrape images and download them using GoQuery:
- Install GoQuery: Make sure you have Go installed on your machine. Then, install GoQuery by running:
go get github.com/PuerkitoBio/goquery
- Write a Go Program to Scrape Image URLs: Use GoQuery to load the webpage and extract the image URLs.
package main
import (
"fmt"
"log"
"net/http"
"os"
"github.com/PuerkitoBio/goquery"
)
func extractImageUrls(url string) ([]string, error) {
// Slice to hold the image URLs
var imageUrls []string
// Make HTTP GET request
res, err := http.Get(url)
if err != nil {
return nil, err
}
defer res.Body.Close()
if res.StatusCode != 200 {
return nil, fmt.Errorf("status code error: %d %s", res.StatusCode, res.Status)
}
// Load the HTML document
doc, err := goquery.NewDocumentFromReader(res.Body)
if err != nil {
return nil, err
}
// Find and iterate through all image elements
doc.Find("img").Each(func(i int, s *goquery.Selection) {
// For each item, get the src attribute
src, exists := s.Attr("src")
if exists {
imageUrls = append(imageUrls, src)
}
})
return imageUrls, nil
}
func main() {
// The URL of the page you want to scrape
url := "http://example.com"
// Extract all image URLs
imageUrls, err := extractImageUrls(url)
if err != nil {
log.Fatal(err)
}
// Print out all image URLs
for _, imgUrl := range imageUrls {
fmt.Println(imgUrl)
}
}
- Download the Images:
After extracting the URLs, use
http.Get
to download each image and save it to a file.
func downloadImage(url, filePath string) error {
// Get the data
resp, err := http.Get(url)
if err != nil {
return err
}
defer resp.Body.Close()
// Check server response
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("bad status: %s", resp.Status)
}
// Create the file
out, err := os.Create(filePath)
if err != nil {
return err
}
defer out.Close()
// Write the body to file
_, err = io.Copy(out, resp.Body)
return err
}
func main() {
// ... previous code ...
// Download each image
for i, imgUrl := range imageUrls {
// Determine the local file path (you may want to create a dedicated folder and check for duplicates)
filePath := fmt.Sprintf("image_%d.jpg", i)
err := downloadImage(imgUrl, filePath)
if err != nil {
log.Printf("Failed to download %s: %v", imgUrl, err)
} else {
log.Printf("Downloaded %s to %s\n", imgUrl, filePath)
}
}
}
- Run Your Go Program:
Save the code to a
.go
file, for examplescrape_images.go
, and run it using:
go run scrape_images.go
Please note that when scraping websites, you should always check the site's robots.txt
file and Terms of Service to understand the scraping rules, and ensure that you are not violating any terms or causing excessive load on the website. Additionally, when saving files, ensure you have the right to download and use the images as per the website's copyright and licensing policies.