Table of contents

Can Colly be used for competitive intelligence and price monitoring?

Yes, Colly is an excellent choice for competitive intelligence and price monitoring applications. As a fast, elegant web scraping framework for Go, Colly provides the robust features needed to build reliable monitoring systems that can track competitor prices, product availability, market trends, and business intelligence data at scale.

Why Colly Excels for Competitive Intelligence

Colly offers several advantages that make it particularly well-suited for competitive intelligence and price monitoring:

  • High Performance: Built in Go, Colly can handle thousands of concurrent requests efficiently
  • Built-in Rate Limiting: Essential for respectful scraping that won't overwhelm target servers
  • Robust Error Handling: Critical for maintaining reliable monitoring systems
  • Cookie and Session Management: Necessary for scraping protected or personalized content
  • Extensible Architecture: Easy to customize for specific business requirements

Basic Price Monitoring Setup

Here's a fundamental example of using Colly to monitor product prices:

package main

import (
    "encoding/json"
    "fmt"
    "log"
    "strconv"
    "strings"
    "time"

    "github.com/gocolly/colly/v2"
    "github.com/gocolly/colly/v2/debug"
)

type Product struct {
    Name        string    `json:"name"`
    Price       float64   `json:"price"`
    URL         string    `json:"url"`
    Timestamp   time.Time `json:"timestamp"`
    Available   bool      `json:"available"`
    Competitor  string    `json:"competitor"`
}

func main() {
    // Create a new collector with debugging
    c := colly.NewCollector(
        colly.Debugger(&debug.LogDebugger{}),
    )

    // Set rate limiting to be respectful
    c.Limit(&colly.LimitRule{
        DomainGlob:  "*",
        Parallelism: 2,
        Delay:       1 * time.Second,
    })

    var products []Product

    // Define the scraping logic
    c.OnHTML(".product-item", func(e *colly.HTMLElement) {
        product := Product{
            Name:       e.ChildText(".product-name"),
            URL:        e.Request.URL.String(),
            Timestamp:  time.Now(),
            Available:  !strings.Contains(e.ChildText(".availability"), "Out of Stock"),
            Competitor: extractCompetitorName(e.Request.URL.Host),
        }

        // Extract and parse price
        priceText := e.ChildText(".price")
        price := extractPrice(priceText)
        product.Price = price

        products = append(products, product)

        fmt.Printf("Found product: %s - $%.2f\n", product.Name, product.Price)
    })

    // Handle errors gracefully
    c.OnError(func(r *colly.Response, err error) {
        log.Printf("Error scraping %s: %v", r.Request.URL, err)
    })

    // Monitor multiple competitor URLs
    urls := []string{
        "https://competitor1.com/products",
        "https://competitor2.com/products",
        "https://competitor3.com/products",
    }

    for _, url := range urls {
        c.Visit(url)
    }

    // Save results
    saveResults(products)
}

func extractPrice(priceText string) float64 {
    // Remove currency symbols and parse price
    cleaned := strings.ReplaceAll(priceText, "$", "")
    cleaned = strings.ReplaceAll(cleaned, ",", "")
    cleaned = strings.TrimSpace(cleaned)

    price, err := strconv.ParseFloat(cleaned, 64)
    if err != nil {
        return 0.0
    }
    return price
}

func extractCompetitorName(host string) string {
    parts := strings.Split(host, ".")
    if len(parts) >= 2 {
        return parts[len(parts)-2]
    }
    return host
}

func saveResults(products []Product) {
    data, err := json.MarshalIndent(products, "", "  ")
    if err != nil {
        log.Fatal(err)
    }

    filename := fmt.Sprintf("price_data_%s.json", time.Now().Format("2006-01-02"))
    // Save to file or database
    fmt.Printf("Saved %d products to %s\n", len(products), filename)
}

Advanced Competitive Intelligence Features

Multi-Site Price Comparison

For comprehensive competitive intelligence, you'll want to monitor multiple competitors simultaneously:

type CompetitorMonitor struct {
    collector   *colly.Collector
    competitors map[string]CompetitorConfig
    results     chan Product
}

type CompetitorConfig struct {
    Name           string
    BaseURL        string
    ProductSelector string
    PriceSelector   string
    NameSelector    string
    RateLimit      time.Duration
}

func NewCompetitorMonitor() *CompetitorMonitor {
    c := colly.NewCollector()

    // Configure for different sites
    c.UserAgent = "Mozilla/5.0 (compatible; PriceBot/1.0)"

    return &CompetitorMonitor{
        collector:   c,
        competitors: make(map[string]CompetitorConfig),
        results:     make(chan Product, 1000),
    }
}

func (cm *CompetitorMonitor) AddCompetitor(config CompetitorConfig) {
    cm.competitors[config.Name] = config

    // Set site-specific rate limiting
    cm.collector.Limit(&colly.LimitRule{
        DomainGlob:  config.BaseURL,
        Parallelism: 1,
        Delay:       config.RateLimit,
    })
}

func (cm *CompetitorMonitor) MonitorPrices(productKeywords []string) {
    for name, config := range cm.competitors {
        go cm.scrapeCompetitor(name, config, productKeywords)
    }
}

Real-time Price Alerts

Implement automated alerting when price changes are detected:

type PriceAlert struct {
    ProductName    string
    Competitor     string
    OldPrice       float64
    NewPrice       float64
    ChangePercent  float64
    Timestamp      time.Time
}

func (cm *CompetitorMonitor) checkPriceChanges(newProduct Product, historical []Product) *PriceAlert {
    for _, oldProduct := range historical {
        if oldProduct.Name == newProduct.Name && oldProduct.Competitor == newProduct.Competitor {
            if oldProduct.Price != newProduct.Price {
                changePercent := ((newProduct.Price - oldProduct.Price) / oldProduct.Price) * 100

                return &PriceAlert{
                    ProductName:   newProduct.Name,
                    Competitor:    newProduct.Competitor,
                    OldPrice:      oldProduct.Price,
                    NewPrice:      newProduct.Price,
                    ChangePercent: changePercent,
                    Timestamp:     time.Now(),
                }
            }
        }
    }
    return nil
}

func sendAlert(alert *PriceAlert) {
    // Send email, Slack notification, or webhook
    fmt.Printf("PRICE ALERT: %s at %s changed from $%.2f to $%.2f (%.1f%%)\n",
        alert.ProductName, alert.Competitor, alert.OldPrice, alert.NewPrice, alert.ChangePercent)
}

Handling Complex E-commerce Sites

Many competitive intelligence scenarios involve scraping sophisticated e-commerce platforms that require advanced techniques:

Session Management and Authentication

func setupAuthenticatedScraping() *colly.Collector {
    c := colly.NewCollector()

    // Handle login forms
    c.OnHTML("form[action*='login']", func(e *colly.HTMLElement) {
        // Extract CSRF tokens and form fields
        token := e.ChildAttr("input[name='_token']", "value")

        // Submit login form
        e.Request.Post(e.Request.AbsoluteURL(e.Attr("action")), map[string]string{
            "username": "your_username",
            "password": "your_password",
            "_token":   token,
        })
    })

    return c
}

JavaScript-Rendered Content

For sites that heavily rely on JavaScript, you might need to integrate Colly with a headless browser. While Colly itself doesn't execute JavaScript, you can combine it with tools like chromedp:

import (
    "context"
    "github.com/chromedp/chromedp"
)

func scrapeJavaScriptSite(url string) (string, error) {
    ctx, cancel := chromedp.NewContext(context.Background())
    defer cancel()

    var htmlContent string
    err := chromedp.Run(ctx,
        chromedp.Navigate(url),
        chromedp.WaitVisible(".product-list"),
        chromedp.OuterHTML("html", &htmlContent),
    )

    return htmlContent, err
}

Best Practices for Competitive Intelligence

1. Respectful Scraping

Always implement proper rate limiting and respect robots.txt:

// Check robots.txt compliance
c.CheckHead = true
c.OnRequest(func(r *colly.Request) {
    // Add delays between requests
    time.Sleep(time.Duration(rand.Intn(3)) * time.Second)
})

2. Data Persistence and Analysis

Store your competitive intelligence data for trend analysis:

import (
    "database/sql"
    _ "github.com/lib/pq"
)

type PriceDatabase struct {
    db *sql.DB
}

func (pd *PriceDatabase) SaveProduct(product Product) error {
    query := `
        INSERT INTO price_history (name, price, competitor, url, timestamp, available)
        VALUES ($1, $2, $3, $4, $5, $6)
    `
    _, err := pd.db.Exec(query, product.Name, product.Price, product.Competitor, 
                        product.URL, product.Timestamp, product.Available)
    return err
}

func (pd *PriceDatabase) GetPriceTrends(productName string, days int) ([]Product, error) {
    query := `
        SELECT name, price, competitor, timestamp 
        FROM price_history 
        WHERE name = $1 AND timestamp > NOW() - INTERVAL '%d days'
        ORDER BY timestamp DESC
    `
    // Execute query and return results
}

3. Error Handling and Monitoring

Implement comprehensive error handling for production systems:

func (cm *CompetitorMonitor) setupErrorHandling() {
    cm.collector.OnError(func(r *colly.Response, err error) {
        log.Printf("Error scraping %s: %v", r.Request.URL, err)

        // Implement retry logic
        if r.StatusCode == 429 { // Rate limited
            time.Sleep(5 * time.Minute)
            r.Request.Retry()
        }
    })

    cm.collector.OnResponse(func(r *colly.Response) {
        if r.StatusCode != 200 {
            log.Printf("Non-200 status code %d for %s", r.StatusCode, r.Request.URL)
        }
    })
}

Scheduling and Automation

For continuous monitoring, implement scheduled scraping:

import "github.com/robfig/cron/v3"

func setupScheduledMonitoring() {
    c := cron.New()

    // Run every hour
    c.AddFunc("0 * * * *", func() {
        monitor := NewCompetitorMonitor()
        monitor.MonitorPrices([]string{"laptop", "smartphone", "tablet"})
    })

    // Daily comprehensive scan
    c.AddFunc("0 2 * * *", func() {
        runFullCompetitiveAnalysis()
    })

    c.Start()
}

Monitoring Performance and Metrics

Track the performance of your competitive intelligence system with these JavaScript analytics:

// Monitor scraping performance metrics
const ScrapingMetrics = {
    async trackRequest(url, startTime) {
        const duration = Date.now() - startTime;
        await fetch('/api/metrics', {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({
                url: url,
                duration: duration,
                timestamp: new Date().toISOString()
            })
        });
    },

    async analyzeCompetitorData(data) {
        const analysis = {
            totalProducts: data.length,
            averagePrice: data.reduce((sum, p) => sum + p.price, 0) / data.length,
            priceRange: {
                min: Math.min(...data.map(p => p.price)),
                max: Math.max(...data.map(p => p.price))
            },
            competitors: [...new Set(data.map(p => p.competitor))]
        };

        console.log('Competitive Analysis:', analysis);
        return analysis;
    }
};

Database Schema for Price History

Set up a proper database schema to store your competitive intelligence data:

-- Create tables for price monitoring
CREATE TABLE competitors (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL UNIQUE,
    base_url VARCHAR(500) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(500) NOT NULL,
    sku VARCHAR(100),
    category VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE price_history (
    id SERIAL PRIMARY KEY,
    product_id INTEGER REFERENCES products(id),
    competitor_id INTEGER REFERENCES competitors(id),
    price DECIMAL(10,2) NOT NULL,
    currency VARCHAR(3) DEFAULT 'USD',
    available BOOLEAN DEFAULT true,
    url VARCHAR(1000),
    scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_product_competitor (product_id, competitor_id),
    INDEX idx_scraped_at (scraped_at)
);

-- Create views for easy analysis
CREATE VIEW latest_prices AS
SELECT 
    p.name as product_name,
    c.name as competitor_name,
    ph.price,
    ph.available,
    ph.scraped_at
FROM price_history ph
JOIN products p ON ph.product_id = p.id
JOIN competitors c ON ph.competitor_id = c.id
WHERE ph.scraped_at = (
    SELECT MAX(scraped_at) 
    FROM price_history ph2 
    WHERE ph2.product_id = ph.product_id 
    AND ph2.competitor_id = ph.competitor_id
);

Legal and Ethical Considerations

When building competitive intelligence systems:

  1. Respect robots.txt and terms of service
  2. Implement reasonable rate limiting to avoid overloading servers
  3. Consider the legal implications in your jurisdiction
  4. Focus on publicly available information only
  5. Ensure data accuracy before making business decisions

Integration with Business Intelligence Tools

Export your data to popular BI platforms:

import (
    "encoding/csv"
    "os"
)

func exportToCSV(products []Product, filename string) error {
    file, err := os.Create(filename)
    if err != nil {
        return err
    }
    defer file.Close()

    writer := csv.NewWriter(file)
    defer writer.Flush()

    // Write header
    writer.Write([]string{"Name", "Price", "Competitor", "Timestamp", "Available"})

    // Write data
    for _, product := range products {
        record := []string{
            product.Name,
            fmt.Sprintf("%.2f", product.Price),
            product.Competitor,
            product.Timestamp.Format("2006-01-02 15:04:05"),
            fmt.Sprintf("%t", product.Available),
        }
        writer.Write(record)
    }

    return nil
}

// JSON export for API integration
func exportToJSON(products []Product, filename string) error {
    data, err := json.MarshalIndent(products, "", "  ")
    if err != nil {
        return err
    }

    return os.WriteFile(filename, data, 0644)
}

Conclusion

Colly is exceptionally well-suited for competitive intelligence and price monitoring applications. Its performance, built-in features for respectful scraping, and Go's concurrency model make it an excellent choice for building robust monitoring systems. Whether you're tracking competitor prices, monitoring product availability, or gathering market intelligence, Colly provides the foundation for reliable, scalable solutions.

For more complex scenarios involving JavaScript-heavy sites, consider integrating Colly with headless browser solutions, or explore alternatives like Puppeteer for handling dynamic content when building comprehensive competitive intelligence platforms. You can also leverage Puppeteer's network monitoring capabilities to track API calls and gather additional competitive insights.

Remember to always implement proper rate limiting, respect website terms of service, and consider the legal implications of your scraping activities in your jurisdiction.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon