Is it possible to scrape images or files with Colly?

Yes, it is possible to scrape images or files with Colly, which is a popular Go library used for web scraping. Colly makes it easy to navigate and scrape parts of websites, including downloading files such as images.

Here's a basic example of how you can use Colly to scrape images from a website:

package main

import (
    "fmt"
    "log"
    "os"

    "github.com/gocolly/colly"
)

func main() {
    // Create a new collector
    c := colly.NewCollector(
        // Visit only domains: example.com
        colly.AllowedDomains("example.com"),
    )

    // On every a element which has href attribute call callback
    c.OnHTML("img", func(e *colly.HTMLElement) {
        // Get the URL of the image
        imgSrc := e.Attr("src")

        // Use absolute URL for the image
        imgSrc = e.Request.AbsoluteURL(imgSrc)

        // Download the image
        fmt.Printf("Image found: %s\n", imgSrc)
        // Create a new folder to store images if it does not exist
        os.MkdirAll("images", os.ModePerm)
        // Download the image and save to the file
        fileName := "images/" + e.Attr("alt") + ".jpg"
        err := c.Visit(imgSrc)
        if err != nil {
            log.Printf("Failed to download image %s: %s", imgSrc, err)
            return
        }
        c.DownloadFile(fileName, imgSrc)
    })

    // Start scraping on the given URL
    c.Visit("http://example.com")
}

This code will do the following:

  1. Create a new Colly collector, constrained to a specific domain (example.com in this case).
  2. Define a callback function that triggers on every <img> tag found on the scraped pages.
  3. Extract the src attribute from the <img> tag to get the image URL.
  4. Use c.DownloadFile() to download the image and save it locally.

Make sure to handle the image file names properly. In this example, we use the alt attribute to name the image file, but you may need to adjust this depending on the website you're scraping.

Remember that scraping images and files should respect the terms of service and copyright laws of the website you're scraping. Always ensure that you have the legal right to download and use the content you're scraping.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon