Yes, it is possible to scrape images or files with Colly, which is a popular Go library used for web scraping. Colly makes it easy to navigate and scrape parts of websites, including downloading files such as images.
Here's a basic example of how you can use Colly to scrape images from a website:
package main
import (
"fmt"
"log"
"os"
"github.com/gocolly/colly"
)
func main() {
// Create a new collector
c := colly.NewCollector(
// Visit only domains: example.com
colly.AllowedDomains("example.com"),
)
// On every a element which has href attribute call callback
c.OnHTML("img", func(e *colly.HTMLElement) {
// Get the URL of the image
imgSrc := e.Attr("src")
// Use absolute URL for the image
imgSrc = e.Request.AbsoluteURL(imgSrc)
// Download the image
fmt.Printf("Image found: %s\n", imgSrc)
// Create a new folder to store images if it does not exist
os.MkdirAll("images", os.ModePerm)
// Download the image and save to the file
fileName := "images/" + e.Attr("alt") + ".jpg"
err := c.Visit(imgSrc)
if err != nil {
log.Printf("Failed to download image %s: %s", imgSrc, err)
return
}
c.DownloadFile(fileName, imgSrc)
})
// Start scraping on the given URL
c.Visit("http://example.com")
}
This code will do the following:
- Create a new Colly collector, constrained to a specific domain (
example.com
in this case). - Define a callback function that triggers on every
<img>
tag found on the scraped pages. - Extract the
src
attribute from the<img>
tag to get the image URL. - Use
c.DownloadFile()
to download the image and save it locally.
Make sure to handle the image file names properly. In this example, we use the alt
attribute to name the image file, but you may need to adjust this depending on the website you're scraping.
Remember that scraping images and files should respect the terms of service and copyright laws of the website you're scraping. Always ensure that you have the legal right to download and use the content you're scraping.