Colly is a popular scraping framework for Golang, designed to make the process of writing web scrapers easy and efficient. Error handling in Colly is an important aspect of building robust and reliable scrapers. Below are some ways to manage error handling in Colly.
Basic Error Handling
When you make a request using Colly, you can handle errors by checking the error value returned by the Collector.Visit
method or by handling them in a callback using Collector.OnError
.
Here's a simple example of checking the error returned by Visit
:
package main
import (
"fmt"
"log"
"github.com/gocolly/colly"
)
func main() {
// Instantiate default collector
c := colly.NewCollector()
// Visit a page
err := c.Visit("http://httpbin.org/status/404")
if err != nil {
log.Println("Something went wrong:", err)
}
}
And here's how you can use OnError
to handle errors:
package main
import (
"fmt"
"log"
"github.com/gocolly/colly"
)
func main() {
// Instantiate default collector
c := colly.NewCollector()
// OnError callback
c.OnError(func(r *colly.Response, err error) {
log.Println("Request URL:", r.Request.URL, "failed with response:", r, "\nError:", err)
})
// Visit a page
c.Visit("http://httpbin.org/status/500")
}
Retrying Failed Requests
If you want to retry a request that has failed, you can do so within the OnError
callback by calling the Retry
method.
Example of retrying a request:
package main
import (
"log"
"github.com/gocolly/colly"
)
func main() {
// Instantiate default collector
c := colly.NewCollector()
// OnError callback
c.OnError(func(r *colly.Response, err error) {
log.Println("Error:", err)
// Attempt to retry the request
err = r.Request.Retry()
if err != nil {
log.Println("Retry failed:", err)
}
})
// Visit a page
c.Visit("http://httpbin.org/status/500")
}
Handling Specific HTTP Status Codes
Colly allows you to handle specific HTTP status codes by using the Collector.OnResponse
method and checking the StatusCode
of the response.
Example of handling specific status codes:
package main
import (
"log"
"github.com/gocolly/colly"
)
func main() {
// Instantiate default collector
c := colly.NewCollector()
// OnResponse callback
c.OnResponse(func(r *colly.Response) {
if r.StatusCode >= 400 {
log.Printf("Response code %d received for URL: %s", r.StatusCode, r.Request.URL)
}
})
// Visit a page
c.Visit("http://httpbin.org/status/404")
}
Custom Error Handling
You can also define custom error handling logic based on the type of error you encounter. For example, you might want to handle network errors differently from HTTP errors.
Example of custom error handling:
package main
import (
"log"
"net/http"
"net/url"
"github.com/gocolly/colly"
)
func main() {
// Instantiate default collector
c := colly.NewCollector()
// OnError callback
c.OnError(func(r *colly.Response, err error) {
switch err := err.(type) {
case *url.Error:
// Handle URL error
log.Println("URL Error:", err)
case *colly.Error:
if err.Type == colly.ErrorTypeTransport {
// Handle transport (network) error
log.Println("Network Error:", err)
} else if err.Type == colly.ErrorTypeHTTP {
// Handle HTTP error based on status code
if r.StatusCode == http.StatusNotFound {
log.Println("Not Found Error:", err)
} else {
log.Println("HTTP Error:", err)
}
}
default:
// Handle other types of errors
log.Println("Other Error:", err)
}
})
// Visit a page
c.Visit("http://httpbin.org/status/404")
}
Remember to always check for errors and handle them appropriately to ensure that your scraper can deal with unexpected situations gracefully.