Table of contents

What are the rate limits for Claude API when scraping?

When building web scraping applications with Claude API, understanding rate limits is crucial for designing scalable and reliable systems. Anthropic implements several types of rate limits to ensure fair usage and system stability across all API users.

Understanding Claude API Rate Limits

Claude API enforces rate limits across multiple dimensions:

1. Requests Per Minute (RPM)

The requests per minute limit controls how many API calls you can make within a 60-second window. These limits vary by tier and model:

Free Tier: - Claude 3.5 Sonnet: 50 RPM - Claude 3 Opus: 5 RPM - Claude 3 Haiku: 50 RPM

Build Tier (Pay-as-you-go): - Claude 3.5 Sonnet: 50 RPM (can be increased) - Claude 3 Opus: 50 RPM - Claude 3 Haiku: 50 RPM

Scale Tier: - Claude 3.5 Sonnet: 1,000+ RPM - Claude 3 Opus: 2,000+ RPM - Claude 3 Haiku: 4,000+ RPM

2. Tokens Per Minute (TPM)

Token limits restrict the total number of input and output tokens processed per minute:

Build Tier: - Claude 3.5 Sonnet: 40,000 TPM - Claude 3 Opus: 20,000 TPM - Claude 3 Haiku: 50,000 TPM

Scale Tier: - Claude 3.5 Sonnet: 400,000+ TPM - Claude 3 Opus: 400,000+ TPM - Claude 3 Haiku: 400,000+ TPM

3. Tokens Per Day (TPD)

Daily token limits provide an additional cap on total usage within a 24-hour period. These limits are tier-specific and can be increased upon request for production applications.

Implementing Rate Limit Handling in Python

Here's a robust Python implementation for handling Claude API rate limits during web scraping:

import anthropic
import time
from datetime import datetime, timedelta
from collections import deque

class ClaudeRateLimiter:
    def __init__(self, max_rpm=50, max_tpm=40000):
        self.max_rpm = max_rpm
        self.max_tpm = max_tpm
        self.request_times = deque()
        self.token_usage = deque()

    def wait_if_needed(self, estimated_tokens=1000):
        """Wait if rate limits would be exceeded"""
        current_time = time.time()
        one_minute_ago = current_time - 60

        # Remove requests older than 1 minute
        while self.request_times and self.request_times[0] < one_minute_ago:
            self.request_times.popleft()

        # Remove token usage older than 1 minute
        while self.token_usage and self.token_usage[0][0] < one_minute_ago:
            self.token_usage.popleft()

        # Calculate current usage
        current_rpm = len(self.request_times)
        current_tpm = sum(tokens for _, tokens in self.token_usage)

        # Wait if needed
        if current_rpm >= self.max_rpm:
            sleep_time = 60 - (current_time - self.request_times[0])
            print(f"Rate limit reached. Sleeping for {sleep_time:.2f}s")
            time.sleep(sleep_time + 0.1)

        if current_tpm + estimated_tokens > self.max_tpm:
            sleep_time = 60 - (current_time - self.token_usage[0][0])
            print(f"Token limit reached. Sleeping for {sleep_time:.2f}s")
            time.sleep(sleep_time + 0.1)

    def record_request(self, tokens_used):
        """Record a completed request"""
        current_time = time.time()
        self.request_times.append(current_time)
        self.token_usage.append((current_time, tokens_used))

class ClaudeWebScraper:
    def __init__(self, api_key, max_rpm=50, max_tpm=40000):
        self.client = anthropic.Anthropic(api_key=api_key)
        self.rate_limiter = ClaudeRateLimiter(max_rpm, max_tpm)

    def extract_data(self, html_content, extraction_prompt):
        """Extract data from HTML with rate limiting"""
        # Estimate tokens (rough approximation: 1 token ≈ 4 characters)
        estimated_tokens = (len(html_content) + len(extraction_prompt)) // 4

        # Wait if rate limits would be exceeded
        self.rate_limiter.wait_if_needed(estimated_tokens)

        try:
            response = self.client.messages.create(
                model="claude-3-5-sonnet-20241022",
                max_tokens=4096,
                messages=[{
                    "role": "user",
                    "content": f"{extraction_prompt}\n\nHTML:\n{html_content}"
                }]
            )

            # Record actual token usage
            total_tokens = response.usage.input_tokens + response.usage.output_tokens
            self.rate_limiter.record_request(total_tokens)

            return response.content[0].text

        except anthropic.RateLimitError as e:
            print(f"Rate limit error: {e}")
            time.sleep(60)
            return self.extract_data(html_content, extraction_prompt)

# Example usage
scraper = ClaudeWebScraper(api_key="your-api-key", max_rpm=50, max_tpm=40000)

# Scrape multiple pages
html_pages = [...]  # Your HTML content
for html in html_pages:
    data = scraper.extract_data(
        html,
        "Extract product name, price, and description as JSON"
    )
    print(data)

JavaScript/Node.js Implementation

For JavaScript-based web scraping with Claude API:

const Anthropic = require('@anthropic-ai/sdk');

class ClaudeRateLimiter {
    constructor(maxRPM = 50, maxTPM = 40000) {
        this.maxRPM = maxRPM;
        this.maxTPM = maxTPM;
        this.requestTimes = [];
        this.tokenUsage = [];
    }

    async waitIfNeeded(estimatedTokens = 1000) {
        const currentTime = Date.now();
        const oneMinuteAgo = currentTime - 60000;

        // Clean old entries
        this.requestTimes = this.requestTimes.filter(t => t > oneMinuteAgo);
        this.tokenUsage = this.tokenUsage.filter(([t]) => t > oneMinuteAgo);

        // Calculate current usage
        const currentRPM = this.requestTimes.length;
        const currentTPM = this.tokenUsage.reduce((sum, [, tokens]) => sum + tokens, 0);

        // Wait if limits exceeded
        if (currentRPM >= this.maxRPM) {
            const sleepTime = 60000 - (currentTime - this.requestTimes[0]) + 100;
            console.log(`Rate limit reached. Sleeping for ${sleepTime}ms`);
            await new Promise(resolve => setTimeout(resolve, sleepTime));
        }

        if (currentTPM + estimatedTokens > this.maxTPM) {
            const sleepTime = 60000 - (currentTime - this.tokenUsage[0][0]) + 100;
            console.log(`Token limit reached. Sleeping for ${sleepTime}ms`);
            await new Promise(resolve => setTimeout(resolve, sleepTime));
        }
    }

    recordRequest(tokensUsed) {
        const currentTime = Date.now();
        this.requestTimes.push(currentTime);
        this.tokenUsage.push([currentTime, tokensUsed]);
    }
}

class ClaudeWebScraper {
    constructor(apiKey, maxRPM = 50, maxTPM = 40000) {
        this.client = new Anthropic({ apiKey });
        this.rateLimiter = new ClaudeRateLimiter(maxRPM, maxTPM);
    }

    async extractData(htmlContent, extractionPrompt) {
        // Estimate tokens
        const estimatedTokens = (htmlContent.length + extractionPrompt.length) / 4;

        await this.rateLimiter.waitIfNeeded(estimatedTokens);

        try {
            const response = await this.client.messages.create({
                model: 'claude-3-5-sonnet-20241022',
                max_tokens: 4096,
                messages: [{
                    role: 'user',
                    content: `${extractionPrompt}\n\nHTML:\n${htmlContent}`
                }]
            });

            const totalTokens = response.usage.input_tokens + response.usage.output_tokens;
            this.rateLimiter.recordRequest(totalTokens);

            return response.content[0].text;
        } catch (error) {
            if (error.status === 429) {
                console.log('Rate limit error, waiting 60s...');
                await new Promise(resolve => setTimeout(resolve, 60000));
                return this.extractData(htmlContent, extractionPrompt);
            }
            throw error;
        }
    }
}

// Example usage with async iteration
async function scrapeWebsites() {
    const scraper = new ClaudeWebScraper(process.env.ANTHROPIC_API_KEY, 50, 40000);

    const htmlPages = [...];  // Your HTML content
    for (const html of htmlPages) {
        const data = await scraper.extractData(
            html,
            'Extract product name, price, and description as JSON'
        );
        console.log(data);
    }
}

Best Practices for Rate Limit Management

1. Implement Exponential Backoff

When you receive a 429 rate limit error, implement exponential backoff:

def exponential_backoff(attempt, base_delay=1, max_delay=60):
    delay = min(base_delay * (2 ** attempt), max_delay)
    jitter = delay * 0.1 * (2 * random.random() - 1)
    return delay + jitter

for attempt in range(5):
    try:
        response = client.messages.create(...)
        break
    except anthropic.RateLimitError:
        if attempt == 4:
            raise
        time.sleep(exponential_backoff(attempt))

2. Batch Processing

Group multiple pages and process them in controlled batches to maximize throughput while respecting limits. When handling AJAX requests using Puppeteer or scraping dynamic content, collect all HTML first before sending to Claude API.

3. Use Haiku for Simple Extractions

Claude 3 Haiku has higher rate limits and lower costs. For simple data extraction where advanced reasoning isn't needed, Haiku can process more pages per minute.

4. Monitor Response Headers

Claude API returns rate limit information in response headers:

response = client.messages.create(...)
print(f"Requests remaining: {response.headers.get('anthropic-ratelimit-requests-remaining')}")
print(f"Tokens remaining: {response.headers.get('anthropic-ratelimit-tokens-remaining')}")
print(f"Reset time: {response.headers.get('anthropic-ratelimit-requests-reset')}")

5. Implement Request Queuing

For large-scale scraping operations, implement a queue system:

import queue
import threading

class ScraperQueue:
    def __init__(self, scraper, num_workers=3):
        self.scraper = scraper
        self.queue = queue.Queue()
        self.results = []
        self.workers = []

        for _ in range(num_workers):
            worker = threading.Thread(target=self._worker)
            worker.start()
            self.workers.append(worker)

    def _worker(self):
        while True:
            item = self.queue.get()
            if item is None:
                break

            html, prompt = item
            result = self.scraper.extract_data(html, prompt)
            self.results.append(result)
            self.queue.task_done()

    def add_task(self, html, prompt):
        self.queue.put((html, prompt))

    def wait_completion(self):
        self.queue.join()
        for _ in self.workers:
            self.queue.put(None)
        for worker in self.workers:
            worker.join()

Increasing Rate Limits

For production web scraping applications requiring higher throughput:

  1. Upgrade to Scale Tier: Contact Anthropic sales to upgrade your account, which provides significantly higher limits
  2. Request Limit Increases: Submit a request through the Anthropic console with details about your use case
  3. Demonstrate Usage Patterns: Show consistent, legitimate usage to qualify for higher limits

When building scrapers that need to run multiple pages in parallel with Puppeteer, ensure your Claude API tier supports the concurrent request volume.

Monitoring and Alerting

Implement monitoring to track rate limit usage:

import logging

class RateLimitMonitor:
    def __init__(self):
        self.logger = logging.getLogger('rate_limit_monitor')
        self.total_requests = 0
        self.total_tokens = 0
        self.rate_limit_hits = 0

    def log_request(self, tokens_used, rate_limited=False):
        self.total_requests += 1
        self.total_tokens += tokens_used

        if rate_limited:
            self.rate_limit_hits += 1
            self.logger.warning(f"Rate limit hit #{self.rate_limit_hits}")

        if self.total_requests % 100 == 0:
            self.logger.info(
                f"Stats: {self.total_requests} requests, "
                f"{self.total_tokens} tokens, "
                f"{self.rate_limit_hits} rate limit hits"
            )

Conclusion

Understanding and properly handling Claude API rate limits is essential for building reliable web scraping systems. By implementing rate limiters, using exponential backoff, and monitoring your usage, you can maximize throughput while staying within API constraints. For production systems requiring higher limits, consider upgrading to the Scale tier or requesting custom limits based on your specific use case.

Remember that efficient token usage and choosing the right model for each task can help you stay within rate limits while processing more pages. Always implement proper error handling and retry logic to gracefully handle temporary rate limit errors during scraping operations.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon