What are the rate limits for Claude API when scraping?
When building web scraping applications with Claude API, understanding rate limits is crucial for designing scalable and reliable systems. Anthropic implements several types of rate limits to ensure fair usage and system stability across all API users.
Understanding Claude API Rate Limits
Claude API enforces rate limits across multiple dimensions:
1. Requests Per Minute (RPM)
The requests per minute limit controls how many API calls you can make within a 60-second window. These limits vary by tier and model:
Free Tier: - Claude 3.5 Sonnet: 50 RPM - Claude 3 Opus: 5 RPM - Claude 3 Haiku: 50 RPM
Build Tier (Pay-as-you-go): - Claude 3.5 Sonnet: 50 RPM (can be increased) - Claude 3 Opus: 50 RPM - Claude 3 Haiku: 50 RPM
Scale Tier: - Claude 3.5 Sonnet: 1,000+ RPM - Claude 3 Opus: 2,000+ RPM - Claude 3 Haiku: 4,000+ RPM
2. Tokens Per Minute (TPM)
Token limits restrict the total number of input and output tokens processed per minute:
Build Tier: - Claude 3.5 Sonnet: 40,000 TPM - Claude 3 Opus: 20,000 TPM - Claude 3 Haiku: 50,000 TPM
Scale Tier: - Claude 3.5 Sonnet: 400,000+ TPM - Claude 3 Opus: 400,000+ TPM - Claude 3 Haiku: 400,000+ TPM
3. Tokens Per Day (TPD)
Daily token limits provide an additional cap on total usage within a 24-hour period. These limits are tier-specific and can be increased upon request for production applications.
Implementing Rate Limit Handling in Python
Here's a robust Python implementation for handling Claude API rate limits during web scraping:
import anthropic
import time
from datetime import datetime, timedelta
from collections import deque
class ClaudeRateLimiter:
def __init__(self, max_rpm=50, max_tpm=40000):
self.max_rpm = max_rpm
self.max_tpm = max_tpm
self.request_times = deque()
self.token_usage = deque()
def wait_if_needed(self, estimated_tokens=1000):
"""Wait if rate limits would be exceeded"""
current_time = time.time()
one_minute_ago = current_time - 60
# Remove requests older than 1 minute
while self.request_times and self.request_times[0] < one_minute_ago:
self.request_times.popleft()
# Remove token usage older than 1 minute
while self.token_usage and self.token_usage[0][0] < one_minute_ago:
self.token_usage.popleft()
# Calculate current usage
current_rpm = len(self.request_times)
current_tpm = sum(tokens for _, tokens in self.token_usage)
# Wait if needed
if current_rpm >= self.max_rpm:
sleep_time = 60 - (current_time - self.request_times[0])
print(f"Rate limit reached. Sleeping for {sleep_time:.2f}s")
time.sleep(sleep_time + 0.1)
if current_tpm + estimated_tokens > self.max_tpm:
sleep_time = 60 - (current_time - self.token_usage[0][0])
print(f"Token limit reached. Sleeping for {sleep_time:.2f}s")
time.sleep(sleep_time + 0.1)
def record_request(self, tokens_used):
"""Record a completed request"""
current_time = time.time()
self.request_times.append(current_time)
self.token_usage.append((current_time, tokens_used))
class ClaudeWebScraper:
def __init__(self, api_key, max_rpm=50, max_tpm=40000):
self.client = anthropic.Anthropic(api_key=api_key)
self.rate_limiter = ClaudeRateLimiter(max_rpm, max_tpm)
def extract_data(self, html_content, extraction_prompt):
"""Extract data from HTML with rate limiting"""
# Estimate tokens (rough approximation: 1 token ≈ 4 characters)
estimated_tokens = (len(html_content) + len(extraction_prompt)) // 4
# Wait if rate limits would be exceeded
self.rate_limiter.wait_if_needed(estimated_tokens)
try:
response = self.client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=4096,
messages=[{
"role": "user",
"content": f"{extraction_prompt}\n\nHTML:\n{html_content}"
}]
)
# Record actual token usage
total_tokens = response.usage.input_tokens + response.usage.output_tokens
self.rate_limiter.record_request(total_tokens)
return response.content[0].text
except anthropic.RateLimitError as e:
print(f"Rate limit error: {e}")
time.sleep(60)
return self.extract_data(html_content, extraction_prompt)
# Example usage
scraper = ClaudeWebScraper(api_key="your-api-key", max_rpm=50, max_tpm=40000)
# Scrape multiple pages
html_pages = [...] # Your HTML content
for html in html_pages:
data = scraper.extract_data(
html,
"Extract product name, price, and description as JSON"
)
print(data)
JavaScript/Node.js Implementation
For JavaScript-based web scraping with Claude API:
const Anthropic = require('@anthropic-ai/sdk');
class ClaudeRateLimiter {
constructor(maxRPM = 50, maxTPM = 40000) {
this.maxRPM = maxRPM;
this.maxTPM = maxTPM;
this.requestTimes = [];
this.tokenUsage = [];
}
async waitIfNeeded(estimatedTokens = 1000) {
const currentTime = Date.now();
const oneMinuteAgo = currentTime - 60000;
// Clean old entries
this.requestTimes = this.requestTimes.filter(t => t > oneMinuteAgo);
this.tokenUsage = this.tokenUsage.filter(([t]) => t > oneMinuteAgo);
// Calculate current usage
const currentRPM = this.requestTimes.length;
const currentTPM = this.tokenUsage.reduce((sum, [, tokens]) => sum + tokens, 0);
// Wait if limits exceeded
if (currentRPM >= this.maxRPM) {
const sleepTime = 60000 - (currentTime - this.requestTimes[0]) + 100;
console.log(`Rate limit reached. Sleeping for ${sleepTime}ms`);
await new Promise(resolve => setTimeout(resolve, sleepTime));
}
if (currentTPM + estimatedTokens > this.maxTPM) {
const sleepTime = 60000 - (currentTime - this.tokenUsage[0][0]) + 100;
console.log(`Token limit reached. Sleeping for ${sleepTime}ms`);
await new Promise(resolve => setTimeout(resolve, sleepTime));
}
}
recordRequest(tokensUsed) {
const currentTime = Date.now();
this.requestTimes.push(currentTime);
this.tokenUsage.push([currentTime, tokensUsed]);
}
}
class ClaudeWebScraper {
constructor(apiKey, maxRPM = 50, maxTPM = 40000) {
this.client = new Anthropic({ apiKey });
this.rateLimiter = new ClaudeRateLimiter(maxRPM, maxTPM);
}
async extractData(htmlContent, extractionPrompt) {
// Estimate tokens
const estimatedTokens = (htmlContent.length + extractionPrompt.length) / 4;
await this.rateLimiter.waitIfNeeded(estimatedTokens);
try {
const response = await this.client.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 4096,
messages: [{
role: 'user',
content: `${extractionPrompt}\n\nHTML:\n${htmlContent}`
}]
});
const totalTokens = response.usage.input_tokens + response.usage.output_tokens;
this.rateLimiter.recordRequest(totalTokens);
return response.content[0].text;
} catch (error) {
if (error.status === 429) {
console.log('Rate limit error, waiting 60s...');
await new Promise(resolve => setTimeout(resolve, 60000));
return this.extractData(htmlContent, extractionPrompt);
}
throw error;
}
}
}
// Example usage with async iteration
async function scrapeWebsites() {
const scraper = new ClaudeWebScraper(process.env.ANTHROPIC_API_KEY, 50, 40000);
const htmlPages = [...]; // Your HTML content
for (const html of htmlPages) {
const data = await scraper.extractData(
html,
'Extract product name, price, and description as JSON'
);
console.log(data);
}
}
Best Practices for Rate Limit Management
1. Implement Exponential Backoff
When you receive a 429 rate limit error, implement exponential backoff:
def exponential_backoff(attempt, base_delay=1, max_delay=60):
delay = min(base_delay * (2 ** attempt), max_delay)
jitter = delay * 0.1 * (2 * random.random() - 1)
return delay + jitter
for attempt in range(5):
try:
response = client.messages.create(...)
break
except anthropic.RateLimitError:
if attempt == 4:
raise
time.sleep(exponential_backoff(attempt))
2. Batch Processing
Group multiple pages and process them in controlled batches to maximize throughput while respecting limits. When handling AJAX requests using Puppeteer or scraping dynamic content, collect all HTML first before sending to Claude API.
3. Use Haiku for Simple Extractions
Claude 3 Haiku has higher rate limits and lower costs. For simple data extraction where advanced reasoning isn't needed, Haiku can process more pages per minute.
4. Monitor Response Headers
Claude API returns rate limit information in response headers:
response = client.messages.create(...)
print(f"Requests remaining: {response.headers.get('anthropic-ratelimit-requests-remaining')}")
print(f"Tokens remaining: {response.headers.get('anthropic-ratelimit-tokens-remaining')}")
print(f"Reset time: {response.headers.get('anthropic-ratelimit-requests-reset')}")
5. Implement Request Queuing
For large-scale scraping operations, implement a queue system:
import queue
import threading
class ScraperQueue:
def __init__(self, scraper, num_workers=3):
self.scraper = scraper
self.queue = queue.Queue()
self.results = []
self.workers = []
for _ in range(num_workers):
worker = threading.Thread(target=self._worker)
worker.start()
self.workers.append(worker)
def _worker(self):
while True:
item = self.queue.get()
if item is None:
break
html, prompt = item
result = self.scraper.extract_data(html, prompt)
self.results.append(result)
self.queue.task_done()
def add_task(self, html, prompt):
self.queue.put((html, prompt))
def wait_completion(self):
self.queue.join()
for _ in self.workers:
self.queue.put(None)
for worker in self.workers:
worker.join()
Increasing Rate Limits
For production web scraping applications requiring higher throughput:
- Upgrade to Scale Tier: Contact Anthropic sales to upgrade your account, which provides significantly higher limits
- Request Limit Increases: Submit a request through the Anthropic console with details about your use case
- Demonstrate Usage Patterns: Show consistent, legitimate usage to qualify for higher limits
When building scrapers that need to run multiple pages in parallel with Puppeteer, ensure your Claude API tier supports the concurrent request volume.
Monitoring and Alerting
Implement monitoring to track rate limit usage:
import logging
class RateLimitMonitor:
def __init__(self):
self.logger = logging.getLogger('rate_limit_monitor')
self.total_requests = 0
self.total_tokens = 0
self.rate_limit_hits = 0
def log_request(self, tokens_used, rate_limited=False):
self.total_requests += 1
self.total_tokens += tokens_used
if rate_limited:
self.rate_limit_hits += 1
self.logger.warning(f"Rate limit hit #{self.rate_limit_hits}")
if self.total_requests % 100 == 0:
self.logger.info(
f"Stats: {self.total_requests} requests, "
f"{self.total_tokens} tokens, "
f"{self.rate_limit_hits} rate limit hits"
)
Conclusion
Understanding and properly handling Claude API rate limits is essential for building reliable web scraping systems. By implementing rate limiters, using exponential backoff, and monitoring your usage, you can maximize throughput while staying within API constraints. For production systems requiring higher limits, consider upgrading to the Scale tier or requesting custom limits based on your specific use case.
Remember that efficient token usage and choosing the right model for each task can help you stay within rate limits while processing more pages. Always implement proper error handling and retry logic to gracefully handle temporary rate limit errors during scraping operations.