How do I set custom headers for a request in Colly?

Colly is a popular Golang library used for web scraping. It allows you to make HTTP requests, navigate and parse HTML documents, and handle scraped data.

To set custom headers for a request in Colly, you can use the OnRequest callback function to modify the request before it is sent. Here's a step-by-step explanation with a code example:

  1. Import the necessary packages.
  2. Create a new Colly collector.
  3. Use the OnRequest callback to modify each request.
  4. Set custom headers using the Headers.Set method on the request object.
  5. Make a request to the target URL.

Here's a code example that demonstrates how to set custom headers:

package main

import (
    "fmt"
    "github.com/gocolly/colly/v2"
)

func main() {
    // Create a new collector
    c := colly.NewCollector()

    // Set custom headers using the OnRequest callback
    c.OnRequest(func(r *colly.Request) {
        r.Headers.Set("Custom-Header", "HeaderValue")
        // You can add as many headers as you need
        r.Headers.Set("Another-Header", "AnotherValue")
    })

    // Define a callback for the collected data
    c.OnHTML("title", func(e *colly.HTMLElement) {
        fmt.Println("Title:", e.Text)
    })

    // Define a callback for errors
    c.OnError(func(r *colly.Response, err error) {
        fmt.Println("Request URL:", r.Request.URL, "failed with response:", r, "\nError:", err)
    })

    // Start scraping
    err := c.Visit("http://httpbin.org/headers")
    if err != nil {
        fmt.Println("Visit failed:", err)
    }
}

In this example, before each request is made, the OnRequest callback is triggered, allowing you to set or modify headers as needed. In this case, two custom headers are added: Custom-Header and Another-Header. The Visit function is then used to make a request to http://httpbin.org/headers, which is a simple HTTP Request & Response Service that returns the headers of the request it receives. Thus, you can verify that your custom headers are being sent correctly.

Remember to handle errors appropriately in your actual application. The OnError callback can help you debug issues by providing the failed request's URL and the associated error.

When you run this code, Colly will make a request to the specified URL with the custom headers you've set. The OnHTML callback is used here to print out the title of the HTML page, but in a real-world scenario, you'd use various Colly selectors to extract the data you're interested in.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon