Table of contents

How do I install Colly in my Go project?

Colly is a powerful Go web scraping framework that makes data extraction simple and efficient. This guide covers the complete installation process and provides practical examples to get you started.

Prerequisites

Before installing Colly, ensure you have: - Go 1.16 or later installed on your system - Basic familiarity with Go modules - A properly configured $GOPATH (if using Go < 1.16)

Installation Steps

1. Initialize Your Go Project

First, create a new directory and initialize a Go module:

mkdir colly-scraper
cd colly-scraper
go mod init colly-scraper

For existing projects, navigate to your project directory:

cd path/to/your/existing/project

2. Install Colly

Install the latest version of Colly v2 using go get:

go get github.com/gocolly/colly/v2

This command will: - Download Colly and its dependencies - Update your go.mod file automatically - Create a go.sum file for dependency verification

3. Verify Installation

Check your go.mod file to confirm Colly was added:

module colly-scraper

go 1.21

require github.com/gocolly/colly/v2 v2.1.0

require (
    github.com/PuerkitoBio/goquery v1.8.1 // indirect
    github.com/andybalholm/cascadia v1.3.1 // indirect
    github.com/antchfx/htmlquery v1.3.0 // indirect
    github.com/antchfx/xmlquery v1.3.17 // indirect
    github.com/antchfx/xpath v1.2.4 // indirect
    github.com/gobwas/glob v0.2.3 // indirect
    github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
    github.com/golang/protobuf v1.5.3 // indirect
    github.com/kennygrant/sanitize v1.2.4 // indirect
    github.com/saintfish/chardet v0.0.0-20230101081208-5e3ef4b5456d // indirect
    github.com/temoto/robotstxt v1.1.2 // indirect
    golang.org/x/net v0.12.0 // indirect
    golang.org/x/text v0.11.0 // indirect
    google.golang.org/appengine v1.6.7 // indirect
    google.golang.org/protobuf v1.31.0 // indirect
)

Basic Usage Examples

Simple Web Scraper

Create a main.go file with this basic example:

package main

import (
    "fmt"
    "log"

    "github.com/gocolly/colly/v2"
)

func main() {
    // Create a new collector
    c := colly.NewCollector(
        colly.Domains("example.com"), // Restrict to specific domains
    )

    // Set up callbacks
    c.OnHTML("h1", func(e *colly.HTMLElement) {
        fmt.Printf("Title: %s\n", e.Text)
    })

    c.OnHTML("a[href]", func(e *colly.HTMLElement) {
        link := e.Attr("href")
        fmt.Printf("Link: %s -> %s\n", e.Text, link)
    })

    c.OnRequest(func(r *colly.Request) {
        fmt.Printf("Visiting: %s\n", r.URL.String())
    })

    c.OnError(func(r *colly.Response, err error) {
        log.Printf("Request URL: %s failed with response: %v\nError: %s",
            r.Request.URL, r, err)
    })

    // Start scraping
    err := c.Visit("https://example.com")
    if err != nil {
        log.Fatal(err)
    }
}

Advanced Example with Rate Limiting

package main

import (
    "fmt"
    "time"

    "github.com/gocolly/colly/v2"
    "github.com/gocolly/colly/v2/debug"
)

func main() {
    c := colly.NewCollector(
        colly.Debugger(&debug.LogDebugger{}), // Enable debugging
    )

    // Limit requests per second
    c.Limit(&colly.LimitRule{
        DomainGlob:  "*",
        Parallelism: 2,
        Delay:       1 * time.Second,
    })

    // Set custom headers
    c.OnRequest(func(r *colly.Request) {
        r.Headers.Set("User-Agent", "MyBot 1.0")
    })

    c.OnHTML("title", func(e *colly.HTMLElement) {
        fmt.Printf("Page title: %s\n", e.Text)
    })

    c.Visit("https://httpbin.org/")
    c.Wait() // Wait for all requests to complete
}

Running Your Scraper

Execute your scraper with:

go run main.go

For production builds:

go build -o scraper main.go
./scraper

Installation Troubleshooting

Common Issues

Module not found error:

go mod tidy
go get github.com/gocolly/colly/v2@latest

Permission denied (proxy environments):

go env -w GOPROXY=direct
go env -w GOSUMDB=off

SSL certificate errors:

go env -w GOSUMDB=off

Next Steps

You now have Colly successfully installed and ready for web scraping in your Go project!

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon