Table of contents

Can Colly be used to submit forms on a website?

Yes, Colly can be used to submit forms on websites. While Colly is primarily designed for web scraping, it provides robust capabilities for form submissions through HTTP POST and GET requests, making it suitable for automating web interactions in Go applications.

How Colly Handles Form Submissions

Colly submits forms by: 1. Parsing form elements from HTML pages 2. Extracting form attributes (action URL, method, field names) 3. Sending HTTP requests with form data 4. Managing cookies and sessions automatically

Basic Form Submission Example

Here's a complete example demonstrating form submission with Colly:

package main

import (
    "fmt"
    "log"
    "net/url"

    "github.com/gocolly/colly/v2"
)

func main() {
    c := colly.NewCollector()

    // Handle form submission
    c.OnHTML("form[action='/login']", func(e *colly.HTMLElement) {
        // Get form action URL
        actionURL := e.Attr("action")

        // Prepare form data
        formData := url.Values{}
        formData.Set("username", "your_username")
        formData.Set("password", "your_password")

        // Submit form via POST request
        err := c.Post(e.Request.AbsoluteURL(actionURL), formData)
        if err != nil {
            log.Printf("Form submission failed: %v", err)
        }
    })

    // Handle successful response
    c.OnResponse(func(r *colly.Response) {
        fmt.Printf("Status: %d\n", r.StatusCode)
        if r.StatusCode == 200 {
            fmt.Println("Form submitted successfully!")
        }
    })

    // Handle errors
    c.OnError(func(r *colly.Response, err error) {
        log.Printf("Error: %s\n", err.Error())
    })

    // Visit the login page
    err := c.Visit("https://example.com/login")
    if err != nil {
        log.Fatal(err)
    }
}

Advanced Form Handling with CSRF Tokens

Many modern websites use CSRF tokens for security. Here's how to handle them:

package main

import (
    "fmt"
    "log"
    "net/url"

    "github.com/gocolly/colly/v2"
)

func main() {
    c := colly.NewCollector()

    var csrfToken string

    // Extract CSRF token first
    c.OnHTML("input[name='_token']", func(e *colly.HTMLElement) {
        csrfToken = e.Attr("value")
        fmt.Printf("CSRF Token found: %s\n", csrfToken)
    })

    // Submit form with CSRF token
    c.OnHTML("form#contact-form", func(e *colly.HTMLElement) {
        actionURL := e.Attr("action")

        formData := url.Values{}
        formData.Set("_token", csrfToken)
        formData.Set("name", "John Doe")
        formData.Set("email", "john@example.com")
        formData.Set("message", "Hello from Colly!")

        err := c.Post(e.Request.AbsoluteURL(actionURL), formData)
        if err != nil {
            log.Printf("Form submission failed: %v", err)
        }
    })

    c.OnResponse(func(r *colly.Response) {
        fmt.Printf("Response status: %d\n", r.StatusCode)
    })

    err := c.Visit("https://example.com/contact")
    if err != nil {
        log.Fatal(err)
    }
}

Multi-Step Form Submission

For complex workflows involving multiple forms or steps:

package main

import (
    "fmt"
    "log"
    "net/url"

    "github.com/gocolly/colly/v2"
)

func main() {
    c := colly.NewCollector()

    // Step 1: Login form
    c.OnHTML("form[action='/auth/login']", func(e *colly.HTMLElement) {
        actionURL := e.Attr("action")

        formData := url.Values{}
        formData.Set("username", "user123")
        formData.Set("password", "password123")

        c.Post(e.Request.AbsoluteURL(actionURL), formData)
    })

    // Step 2: Profile update form (after successful login)
    c.OnHTML("form[action='/profile/update']", func(e *colly.HTMLElement) {
        actionURL := e.Attr("action")

        formData := url.Values{}
        formData.Set("first_name", "John")
        formData.Set("last_name", "Doe")
        formData.Set("bio", "Software Developer")

        c.Post(e.Request.AbsoluteURL(actionURL), formData)
    })

    // Handle redirects and responses
    c.OnResponse(func(r *colly.Response) {
        fmt.Printf("Visited: %s (Status: %d)\n", r.Request.URL, r.StatusCode)

        // Check if we need to visit the profile page after login
        if r.Request.URL.Path == "/dashboard" {
            c.Visit(r.Request.AbsoluteURL("/profile"))
        }
    })

    // Start the process
    c.Visit("https://example.com/login")
}

File Upload Forms

Colly can also handle file uploads:

package main

import (
    "bytes"
    "fmt"
    "io"
    "log"
    "mime/multipart"
    "os"

    "github.com/gocolly/colly/v2"
)

func submitFileUpload(c *colly.Collector, actionURL, filePath string) error {
    // Open the file
    file, err := os.Open(filePath)
    if err != nil {
        return err
    }
    defer file.Close()

    // Create multipart form
    var body bytes.Buffer
    writer := multipart.NewWriter(&body)

    // Add file field
    part, err := writer.CreateFormFile("upload", "document.pdf")
    if err != nil {
        return err
    }

    _, err = io.Copy(part, file)
    if err != nil {
        return err
    }

    // Add other form fields
    writer.WriteField("description", "Important document")
    writer.Close()

    // Submit with custom headers
    return c.PostRaw(actionURL, body.Bytes(), map[string]string{
        "Content-Type": writer.FormDataContentType(),
    })
}

Best Practices and Considerations

1. Session Management

// Enable cookie handling for session persistence
c := colly.NewCollector()
c.OnRequest(func(r *colly.Request) {
    r.Headers.Set("User-Agent", "Mozilla/5.0 (compatible; Colly)")
})

2. Rate Limiting

// Add delays to avoid overwhelming servers
c.Limit(&colly.LimitRule{
    DomainGlob:  "*",
    Parallelism: 1,
    Delay:       2 * time.Second,
})

3. Error Handling

c.OnError(func(r *colly.Response, err error) {
    if r != nil {
        log.Printf("Failed to submit form to %s: Status %d, Error: %v", 
            r.Request.URL, r.StatusCode, err)
    } else {
        log.Printf("Request failed: %v", err)
    }
})

Limitations and Alternatives

Colly Limitations: - No JavaScript execution (SPA forms may not work) - Cannot handle complex client-side validations - Limited support for dynamic form fields

When to Use Alternatives: - Chromedp or Rod: For JavaScript-heavy forms - Playwright-go: For complex browser automation - HTTP clients: For simple API-based form submissions

Security and Ethics

  • Check robots.txt and terms of service
  • Respect rate limits and server resources
  • Handle CSRF tokens properly for security
  • Use appropriate user agents to identify your bot
  • Implement proper error handling and retries

Colly provides excellent form submission capabilities for most web scraping scenarios, making it a powerful tool for automating web interactions in Go applications.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon