Can Colly be used for competitive intelligence and price monitoring?
Yes, Colly is an excellent choice for competitive intelligence and price monitoring applications. As a fast, elegant web scraping framework for Go, Colly provides the robust features needed to build reliable monitoring systems that can track competitor prices, product availability, market trends, and business intelligence data at scale.
Why Colly Excels for Competitive Intelligence
Colly offers several advantages that make it particularly well-suited for competitive intelligence and price monitoring:
- High Performance: Built in Go, Colly can handle thousands of concurrent requests efficiently
- Built-in Rate Limiting: Essential for respectful scraping that won't overwhelm target servers
- Robust Error Handling: Critical for maintaining reliable monitoring systems
- Cookie and Session Management: Necessary for scraping protected or personalized content
- Extensible Architecture: Easy to customize for specific business requirements
Basic Price Monitoring Setup
Here's a fundamental example of using Colly to monitor product prices:
package main
import (
"encoding/json"
"fmt"
"log"
"strconv"
"strings"
"time"
"github.com/gocolly/colly/v2"
"github.com/gocolly/colly/v2/debug"
)
type Product struct {
Name string `json:"name"`
Price float64 `json:"price"`
URL string `json:"url"`
Timestamp time.Time `json:"timestamp"`
Available bool `json:"available"`
Competitor string `json:"competitor"`
}
func main() {
// Create a new collector with debugging
c := colly.NewCollector(
colly.Debugger(&debug.LogDebugger{}),
)
// Set rate limiting to be respectful
c.Limit(&colly.LimitRule{
DomainGlob: "*",
Parallelism: 2,
Delay: 1 * time.Second,
})
var products []Product
// Define the scraping logic
c.OnHTML(".product-item", func(e *colly.HTMLElement) {
product := Product{
Name: e.ChildText(".product-name"),
URL: e.Request.URL.String(),
Timestamp: time.Now(),
Available: !strings.Contains(e.ChildText(".availability"), "Out of Stock"),
Competitor: extractCompetitorName(e.Request.URL.Host),
}
// Extract and parse price
priceText := e.ChildText(".price")
price := extractPrice(priceText)
product.Price = price
products = append(products, product)
fmt.Printf("Found product: %s - $%.2f\n", product.Name, product.Price)
})
// Handle errors gracefully
c.OnError(func(r *colly.Response, err error) {
log.Printf("Error scraping %s: %v", r.Request.URL, err)
})
// Monitor multiple competitor URLs
urls := []string{
"https://competitor1.com/products",
"https://competitor2.com/products",
"https://competitor3.com/products",
}
for _, url := range urls {
c.Visit(url)
}
// Save results
saveResults(products)
}
func extractPrice(priceText string) float64 {
// Remove currency symbols and parse price
cleaned := strings.ReplaceAll(priceText, "$", "")
cleaned = strings.ReplaceAll(cleaned, ",", "")
cleaned = strings.TrimSpace(cleaned)
price, err := strconv.ParseFloat(cleaned, 64)
if err != nil {
return 0.0
}
return price
}
func extractCompetitorName(host string) string {
parts := strings.Split(host, ".")
if len(parts) >= 2 {
return parts[len(parts)-2]
}
return host
}
func saveResults(products []Product) {
data, err := json.MarshalIndent(products, "", " ")
if err != nil {
log.Fatal(err)
}
filename := fmt.Sprintf("price_data_%s.json", time.Now().Format("2006-01-02"))
// Save to file or database
fmt.Printf("Saved %d products to %s\n", len(products), filename)
}
Advanced Competitive Intelligence Features
Multi-Site Price Comparison
For comprehensive competitive intelligence, you'll want to monitor multiple competitors simultaneously:
type CompetitorMonitor struct {
collector *colly.Collector
competitors map[string]CompetitorConfig
results chan Product
}
type CompetitorConfig struct {
Name string
BaseURL string
ProductSelector string
PriceSelector string
NameSelector string
RateLimit time.Duration
}
func NewCompetitorMonitor() *CompetitorMonitor {
c := colly.NewCollector()
// Configure for different sites
c.UserAgent = "Mozilla/5.0 (compatible; PriceBot/1.0)"
return &CompetitorMonitor{
collector: c,
competitors: make(map[string]CompetitorConfig),
results: make(chan Product, 1000),
}
}
func (cm *CompetitorMonitor) AddCompetitor(config CompetitorConfig) {
cm.competitors[config.Name] = config
// Set site-specific rate limiting
cm.collector.Limit(&colly.LimitRule{
DomainGlob: config.BaseURL,
Parallelism: 1,
Delay: config.RateLimit,
})
}
func (cm *CompetitorMonitor) MonitorPrices(productKeywords []string) {
for name, config := range cm.competitors {
go cm.scrapeCompetitor(name, config, productKeywords)
}
}
Real-time Price Alerts
Implement automated alerting when price changes are detected:
type PriceAlert struct {
ProductName string
Competitor string
OldPrice float64
NewPrice float64
ChangePercent float64
Timestamp time.Time
}
func (cm *CompetitorMonitor) checkPriceChanges(newProduct Product, historical []Product) *PriceAlert {
for _, oldProduct := range historical {
if oldProduct.Name == newProduct.Name && oldProduct.Competitor == newProduct.Competitor {
if oldProduct.Price != newProduct.Price {
changePercent := ((newProduct.Price - oldProduct.Price) / oldProduct.Price) * 100
return &PriceAlert{
ProductName: newProduct.Name,
Competitor: newProduct.Competitor,
OldPrice: oldProduct.Price,
NewPrice: newProduct.Price,
ChangePercent: changePercent,
Timestamp: time.Now(),
}
}
}
}
return nil
}
func sendAlert(alert *PriceAlert) {
// Send email, Slack notification, or webhook
fmt.Printf("PRICE ALERT: %s at %s changed from $%.2f to $%.2f (%.1f%%)\n",
alert.ProductName, alert.Competitor, alert.OldPrice, alert.NewPrice, alert.ChangePercent)
}
Handling Complex E-commerce Sites
Many competitive intelligence scenarios involve scraping sophisticated e-commerce platforms that require advanced techniques:
Session Management and Authentication
func setupAuthenticatedScraping() *colly.Collector {
c := colly.NewCollector()
// Handle login forms
c.OnHTML("form[action*='login']", func(e *colly.HTMLElement) {
// Extract CSRF tokens and form fields
token := e.ChildAttr("input[name='_token']", "value")
// Submit login form
e.Request.Post(e.Request.AbsoluteURL(e.Attr("action")), map[string]string{
"username": "your_username",
"password": "your_password",
"_token": token,
})
})
return c
}
JavaScript-Rendered Content
For sites that heavily rely on JavaScript, you might need to integrate Colly with a headless browser. While Colly itself doesn't execute JavaScript, you can combine it with tools like chromedp:
import (
"context"
"github.com/chromedp/chromedp"
)
func scrapeJavaScriptSite(url string) (string, error) {
ctx, cancel := chromedp.NewContext(context.Background())
defer cancel()
var htmlContent string
err := chromedp.Run(ctx,
chromedp.Navigate(url),
chromedp.WaitVisible(".product-list"),
chromedp.OuterHTML("html", &htmlContent),
)
return htmlContent, err
}
Best Practices for Competitive Intelligence
1. Respectful Scraping
Always implement proper rate limiting and respect robots.txt:
// Check robots.txt compliance
c.CheckHead = true
c.OnRequest(func(r *colly.Request) {
// Add delays between requests
time.Sleep(time.Duration(rand.Intn(3)) * time.Second)
})
2. Data Persistence and Analysis
Store your competitive intelligence data for trend analysis:
import (
"database/sql"
_ "github.com/lib/pq"
)
type PriceDatabase struct {
db *sql.DB
}
func (pd *PriceDatabase) SaveProduct(product Product) error {
query := `
INSERT INTO price_history (name, price, competitor, url, timestamp, available)
VALUES ($1, $2, $3, $4, $5, $6)
`
_, err := pd.db.Exec(query, product.Name, product.Price, product.Competitor,
product.URL, product.Timestamp, product.Available)
return err
}
func (pd *PriceDatabase) GetPriceTrends(productName string, days int) ([]Product, error) {
query := `
SELECT name, price, competitor, timestamp
FROM price_history
WHERE name = $1 AND timestamp > NOW() - INTERVAL '%d days'
ORDER BY timestamp DESC
`
// Execute query and return results
}
3. Error Handling and Monitoring
Implement comprehensive error handling for production systems:
func (cm *CompetitorMonitor) setupErrorHandling() {
cm.collector.OnError(func(r *colly.Response, err error) {
log.Printf("Error scraping %s: %v", r.Request.URL, err)
// Implement retry logic
if r.StatusCode == 429 { // Rate limited
time.Sleep(5 * time.Minute)
r.Request.Retry()
}
})
cm.collector.OnResponse(func(r *colly.Response) {
if r.StatusCode != 200 {
log.Printf("Non-200 status code %d for %s", r.StatusCode, r.Request.URL)
}
})
}
Scheduling and Automation
For continuous monitoring, implement scheduled scraping:
import "github.com/robfig/cron/v3"
func setupScheduledMonitoring() {
c := cron.New()
// Run every hour
c.AddFunc("0 * * * *", func() {
monitor := NewCompetitorMonitor()
monitor.MonitorPrices([]string{"laptop", "smartphone", "tablet"})
})
// Daily comprehensive scan
c.AddFunc("0 2 * * *", func() {
runFullCompetitiveAnalysis()
})
c.Start()
}
Monitoring Performance and Metrics
Track the performance of your competitive intelligence system with these JavaScript analytics:
// Monitor scraping performance metrics
const ScrapingMetrics = {
async trackRequest(url, startTime) {
const duration = Date.now() - startTime;
await fetch('/api/metrics', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
url: url,
duration: duration,
timestamp: new Date().toISOString()
})
});
},
async analyzeCompetitorData(data) {
const analysis = {
totalProducts: data.length,
averagePrice: data.reduce((sum, p) => sum + p.price, 0) / data.length,
priceRange: {
min: Math.min(...data.map(p => p.price)),
max: Math.max(...data.map(p => p.price))
},
competitors: [...new Set(data.map(p => p.competitor))]
};
console.log('Competitive Analysis:', analysis);
return analysis;
}
};
Database Schema for Price History
Set up a proper database schema to store your competitive intelligence data:
-- Create tables for price monitoring
CREATE TABLE competitors (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL UNIQUE,
base_url VARCHAR(500) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(500) NOT NULL,
sku VARCHAR(100),
category VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE price_history (
id SERIAL PRIMARY KEY,
product_id INTEGER REFERENCES products(id),
competitor_id INTEGER REFERENCES competitors(id),
price DECIMAL(10,2) NOT NULL,
currency VARCHAR(3) DEFAULT 'USD',
available BOOLEAN DEFAULT true,
url VARCHAR(1000),
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_product_competitor (product_id, competitor_id),
INDEX idx_scraped_at (scraped_at)
);
-- Create views for easy analysis
CREATE VIEW latest_prices AS
SELECT
p.name as product_name,
c.name as competitor_name,
ph.price,
ph.available,
ph.scraped_at
FROM price_history ph
JOIN products p ON ph.product_id = p.id
JOIN competitors c ON ph.competitor_id = c.id
WHERE ph.scraped_at = (
SELECT MAX(scraped_at)
FROM price_history ph2
WHERE ph2.product_id = ph.product_id
AND ph2.competitor_id = ph.competitor_id
);
Legal and Ethical Considerations
When building competitive intelligence systems:
- Respect robots.txt and terms of service
- Implement reasonable rate limiting to avoid overloading servers
- Consider the legal implications in your jurisdiction
- Focus on publicly available information only
- Ensure data accuracy before making business decisions
Integration with Business Intelligence Tools
Export your data to popular BI platforms:
import (
"encoding/csv"
"os"
)
func exportToCSV(products []Product, filename string) error {
file, err := os.Create(filename)
if err != nil {
return err
}
defer file.Close()
writer := csv.NewWriter(file)
defer writer.Flush()
// Write header
writer.Write([]string{"Name", "Price", "Competitor", "Timestamp", "Available"})
// Write data
for _, product := range products {
record := []string{
product.Name,
fmt.Sprintf("%.2f", product.Price),
product.Competitor,
product.Timestamp.Format("2006-01-02 15:04:05"),
fmt.Sprintf("%t", product.Available),
}
writer.Write(record)
}
return nil
}
// JSON export for API integration
func exportToJSON(products []Product, filename string) error {
data, err := json.MarshalIndent(products, "", " ")
if err != nil {
return err
}
return os.WriteFile(filename, data, 0644)
}
Conclusion
Colly is exceptionally well-suited for competitive intelligence and price monitoring applications. Its performance, built-in features for respectful scraping, and Go's concurrency model make it an excellent choice for building robust monitoring systems. Whether you're tracking competitor prices, monitoring product availability, or gathering market intelligence, Colly provides the foundation for reliable, scalable solutions.
For more complex scenarios involving JavaScript-heavy sites, consider integrating Colly with headless browser solutions, or explore alternatives like Puppeteer for handling dynamic content when building comprehensive competitive intelligence platforms. You can also leverage Puppeteer's network monitoring capabilities to track API calls and gather additional competitive insights.
Remember to always implement proper rate limiting, respect website terms of service, and consider the legal implications of your scraping activities in your jurisdiction.