What is the Token Limit for Deepseek API Requests?
When using Deepseek AI models for web scraping and data extraction tasks, understanding token limits is crucial for building efficient and cost-effective applications. The token limits vary depending on which Deepseek model you're using, and managing these limits properly can significantly impact your scraping workflow's performance and reliability.
Deepseek Model Token Limits
Deepseek offers several models with different token limits and capabilities:
Deepseek V3
Deepseek V3 is the flagship model with the following specifications:
- Context Window: 64,000 tokens (64K)
- Maximum Output Tokens: 8,192 tokens
- Total Token Limit: Input + Output cannot exceed 64K tokens
This model is ideal for processing large HTML documents and extracting complex structured data from web pages.
Deepseek Chat
Deepseek Chat provides:
- Context Window: 32,000 tokens (32K)
- Maximum Output Tokens: 4,096 tokens
- Total Token Limit: Input + Output cannot exceed 32K tokens
This model works well for standard web scraping tasks where you're extracting data from moderately-sized pages.
Deepseek Coder
Deepseek Coder is optimized for code-related tasks:
- Context Window: 16,000 tokens (16K)
- Maximum Output Tokens: 4,096 tokens
- Total Token Limit: Input + Output cannot exceed 16K tokens
While primarily designed for coding tasks, it can be useful for scraping technical documentation and code repositories.
Understanding Tokens in Web Scraping Context
In the context of web scraping, tokens represent pieces of text that the AI model processes. Here's how different content translates to tokens:
- 1 token ≈ 4 characters of English text
- 1 token ≈ 0.75 words on average
- HTML markup typically requires more tokens due to tag structure
- JSON output requires tokens for formatting and structure
Example Token Calculation
# A typical product page HTML snippet
html_content = """
<div class="product">
<h1>Wireless Headphones</h1>
<p class="price">$99.99</p>
<p class="description">High-quality wireless headphones with noise cancellation</p>
</div>
"""
# Approximate token count: ~50-60 tokens
# Extracted JSON output might use: ~20-30 tokens
Managing Token Limits in Web Scraping
1. Pre-processing HTML Content
Before sending HTML to the Deepseek API, clean and reduce the content to stay within token limits:
import requests
from bs4 import BeautifulSoup
import openai
def clean_html_for_llm(html_content, max_chars=100000):
"""
Clean HTML and extract only relevant content
"""
soup = BeautifulSoup(html_content, 'html.parser')
# Remove unnecessary elements
for element in soup(['script', 'style', 'nav', 'footer', 'header', 'iframe']):
element.decompose()
# Get text content or simplified HTML
cleaned_content = str(soup)
# Truncate if necessary (rough estimate: 4 chars per token)
if len(cleaned_content) > max_chars:
cleaned_content = cleaned_content[:max_chars]
return cleaned_content
def scrape_with_deepseek(url, extraction_prompt):
"""
Scrape a URL using Deepseek API with token management
"""
# Fetch the page
response = requests.get(url)
html_content = response.text
# Clean HTML to reduce tokens (target: ~60K characters for V3 model)
cleaned_html = clean_html_for_llm(html_content, max_chars=240000)
# Configure Deepseek API client
client = openai.OpenAI(
api_key="your-deepseek-api-key",
base_url="https://api.deepseek.com/v1"
)
# Make API request with token limits
completion = client.chat.completions.create(
model="deepseek-chat",
messages=[
{
"role": "system",
"content": "You are a web scraping assistant. Extract structured data from HTML."
},
{
"role": "user",
"content": f"{extraction_prompt}\n\nHTML Content:\n{cleaned_html}"
}
],
max_tokens=4096, # Maximum output tokens
temperature=0.1
)
return completion.choices[0].message.content
# Example usage
url = "https://example.com/products/wireless-headphones"
prompt = "Extract product name, price, description, and features as JSON"
result = scrape_with_deepseek(url, prompt)
print(result)
2. JavaScript Implementation with Token Management
const axios = require('axios');
const cheerio = require('cheerio');
async function cleanHtmlForLLM(htmlContent, maxChars = 100000) {
const $ = cheerio.load(htmlContent);
// Remove unnecessary elements
$('script, style, nav, footer, header, iframe').remove();
// Get cleaned HTML
let cleaned = $.html();
// Truncate if necessary
if (cleaned.length > maxChars) {
cleaned = cleaned.substring(0, maxChars);
}
return cleaned;
}
async function scrapeWithDeepseek(url, extractionPrompt) {
try {
// Fetch the page
const response = await axios.get(url);
const htmlContent = response.data;
// Clean HTML to manage token limits (240K chars ≈ 60K tokens)
const cleanedHtml = await cleanHtmlForLLM(htmlContent, 240000);
// Call Deepseek API
const apiResponse = await axios.post(
'https://api.deepseek.com/v1/chat/completions',
{
model: 'deepseek-chat',
messages: [
{
role: 'system',
content: 'You are a web scraping assistant. Extract structured data from HTML.'
},
{
role: 'user',
content: `${extractionPrompt}\n\nHTML Content:\n${cleanedHtml}`
}
],
max_tokens: 4096,
temperature: 0.1
},
{
headers: {
'Authorization': `Bearer ${process.env.DEEPSEEK_API_KEY}`,
'Content-Type': 'application/json'
}
}
);
return apiResponse.data.choices[0].message.content;
} catch (error) {
if (error.response?.status === 400 &&
error.response?.data?.error?.message?.includes('token')) {
console.error('Token limit exceeded. Consider reducing input size.');
}
throw error;
}
}
// Example usage
(async () => {
const url = 'https://example.com/products/wireless-headphones';
const prompt = 'Extract product name, price, description, and features as JSON';
const result = await scrapeWithDeepseek(url, prompt);
console.log(result);
})();
3. Chunking Large Documents
For very large web pages that exceed token limits even after cleaning, implement chunking:
def chunk_html_content(html_content, chunk_size=200000):
"""
Split HTML content into manageable chunks
"""
soup = BeautifulSoup(html_content, 'html.parser')
# Find main content sections
sections = soup.find_all(['article', 'section', 'div'], class_=lambda x: x and 'content' in x.lower())
chunks = []
current_chunk = ""
for section in sections:
section_html = str(section)
if len(current_chunk) + len(section_html) > chunk_size:
if current_chunk:
chunks.append(current_chunk)
current_chunk = section_html
else:
current_chunk += section_html
if current_chunk:
chunks.append(current_chunk)
return chunks
def scrape_large_page(url, extraction_prompt):
"""
Scrape a large page by processing it in chunks
"""
response = requests.get(url)
chunks = chunk_html_content(response.text)
results = []
for i, chunk in enumerate(chunks):
print(f"Processing chunk {i+1}/{len(chunks)}")
result = scrape_with_deepseek_chunk(chunk, extraction_prompt)
results.append(result)
return results
Token Limit Error Handling
Proper error handling is essential when working with token limits:
import json
def scrape_with_retry(url, prompt, max_retries=3):
"""
Scrape with automatic retry and content reduction on token errors
"""
max_chars = 240000 # Starting limit for Deepseek V3
for attempt in range(max_retries):
try:
response = requests.get(url)
cleaned_html = clean_html_for_llm(response.text, max_chars=max_chars)
client = openai.OpenAI(
api_key="your-deepseek-api-key",
base_url="https://api.deepseek.com/v1"
)
completion = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "Extract data from HTML as JSON."},
{"role": "user", "content": f"{prompt}\n\n{cleaned_html}"}
],
max_tokens=4096
)
return json.loads(completion.choices[0].message.content)
except Exception as e:
error_message = str(e)
if 'token' in error_message.lower() or 'length' in error_message.lower():
# Reduce content size by 30% and retry
max_chars = int(max_chars * 0.7)
print(f"Token limit exceeded. Retrying with reduced content: {max_chars} chars")
continue
else:
raise e
raise Exception(f"Failed after {max_retries} attempts")
Best Practices for Token Management
1. Monitor Token Usage
Track your token consumption to optimize costs and performance:
def track_token_usage(completion_response):
"""
Extract and log token usage from API response
"""
usage = completion_response.usage
print(f"Prompt tokens: {usage.prompt_tokens}")
print(f"Completion tokens: {usage.completion_tokens}")
print(f"Total tokens: {usage.total_tokens}")
# Calculate approximate cost (check current Deepseek pricing)
cost_per_1k_input = 0.00014 # Example pricing
cost_per_1k_output = 0.00028
total_cost = (
(usage.prompt_tokens / 1000 * cost_per_1k_input) +
(usage.completion_tokens / 1000 * cost_per_1k_output)
)
print(f"Estimated cost: ${total_cost:.6f}")
return usage
2. Use Selective HTML Extraction
Instead of sending entire HTML documents, extract only the relevant sections. When working with dynamic JavaScript-heavy websites, you might need to handle AJAX requests using tools like Puppeteer for handling AJAX requests before processing the content.
def extract_main_content(html_content, selectors):
"""
Extract only specific sections from HTML
"""
soup = BeautifulSoup(html_content, 'html.parser')
main_content = ""
for selector in selectors:
elements = soup.select(selector)
for element in elements:
main_content += str(element)
return main_content
# Example: Extract only product information
selectors = [
'.product-info',
'.product-description',
'.product-reviews',
'.pricing-section'
]
html_response = requests.get(url).text
relevant_html = extract_main_content(html_response, selectors)
3. Optimize Output Token Usage
Request concise outputs to maximize the content you can process:
# Efficient prompt engineering
prompt = """
Extract the following as compact JSON (no extra whitespace):
- product_name
- price (number only)
- rating (number only)
- in_stock (boolean)
Return ONLY the JSON object, no explanations.
"""
Comparing Deepseek Token Limits with Other LLMs
| Model | Context Window | Max Output | Best For Web Scraping | |-------|----------------|------------|----------------------| | Deepseek V3 | 64K tokens | 8K tokens | Large e-commerce pages, documentation | | Deepseek Chat | 32K tokens | 4K tokens | Standard product pages, articles | | GPT-4 Turbo | 128K tokens | 4K tokens | Very large documents | | Claude 3.5 Sonnet | 200K tokens | 8K tokens | Massive documents, entire websites |
Conclusion
Understanding and managing token limits is essential for successful web scraping with the Deepseek API. By implementing proper HTML cleaning, chunking strategies, and error handling, you can efficiently extract data from web pages while staying within token limits and optimizing costs.
Key takeaways: - Deepseek V3 supports up to 64K tokens context window - Pre-process HTML to remove unnecessary content before API calls - Implement chunking for pages that exceed token limits - Monitor token usage to optimize costs and performance - Use selective extraction to focus on relevant content only
For web scraping projects that require processing JavaScript-rendered content before sending it to LLMs, consider using browser automation tools to handle browser sessions and render dynamic content first.
By following these best practices, you can build robust and efficient web scraping applications powered by Deepseek AI models while staying within token limits and managing costs effectively.