Colly is a popular Golang library used for web scraping. It allows you to make HTTP requests, navigate and parse HTML documents, and handle scraped data.
To set custom headers for a request in Colly, you can use the OnRequest
callback function to modify the request before it is sent. Here's a step-by-step explanation with a code example:
- Import the necessary packages.
- Create a new Colly collector.
- Use the
OnRequest
callback to modify each request. - Set custom headers using the
Headers.Set
method on the request object. - Make a request to the target URL.
Here's a code example that demonstrates how to set custom headers:
package main
import (
"fmt"
"github.com/gocolly/colly/v2"
)
func main() {
// Create a new collector
c := colly.NewCollector()
// Set custom headers using the OnRequest callback
c.OnRequest(func(r *colly.Request) {
r.Headers.Set("Custom-Header", "HeaderValue")
// You can add as many headers as you need
r.Headers.Set("Another-Header", "AnotherValue")
})
// Define a callback for the collected data
c.OnHTML("title", func(e *colly.HTMLElement) {
fmt.Println("Title:", e.Text)
})
// Define a callback for errors
c.OnError(func(r *colly.Response, err error) {
fmt.Println("Request URL:", r.Request.URL, "failed with response:", r, "\nError:", err)
})
// Start scraping
err := c.Visit("http://httpbin.org/headers")
if err != nil {
fmt.Println("Visit failed:", err)
}
}
In this example, before each request is made, the OnRequest
callback is triggered, allowing you to set or modify headers as needed. In this case, two custom headers are added: Custom-Header
and Another-Header
. The Visit
function is then used to make a request to http://httpbin.org/headers
, which is a simple HTTP Request & Response Service that returns the headers of the request it receives. Thus, you can verify that your custom headers are being sent correctly.
Remember to handle errors appropriately in your actual application. The OnError
callback can help you debug issues by providing the failed request's URL and the associated error.
When you run this code, Colly will make a request to the specified URL with the custom headers you've set. The OnHTML
callback is used here to print out the title of the HTML page, but in a real-world scenario, you'd use various Colly selectors to extract the data you're interested in.