Table of contents

What is the Token Limit for Deepseek API Requests?

When using Deepseek AI models for web scraping and data extraction tasks, understanding token limits is crucial for building efficient and cost-effective applications. The token limits vary depending on which Deepseek model you're using, and managing these limits properly can significantly impact your scraping workflow's performance and reliability.

Deepseek Model Token Limits

Deepseek offers several models with different token limits and capabilities:

Deepseek V3

Deepseek V3 is the flagship model with the following specifications:

  • Context Window: 64,000 tokens (64K)
  • Maximum Output Tokens: 8,192 tokens
  • Total Token Limit: Input + Output cannot exceed 64K tokens

This model is ideal for processing large HTML documents and extracting complex structured data from web pages.

Deepseek Chat

Deepseek Chat provides:

  • Context Window: 32,000 tokens (32K)
  • Maximum Output Tokens: 4,096 tokens
  • Total Token Limit: Input + Output cannot exceed 32K tokens

This model works well for standard web scraping tasks where you're extracting data from moderately-sized pages.

Deepseek Coder

Deepseek Coder is optimized for code-related tasks:

  • Context Window: 16,000 tokens (16K)
  • Maximum Output Tokens: 4,096 tokens
  • Total Token Limit: Input + Output cannot exceed 16K tokens

While primarily designed for coding tasks, it can be useful for scraping technical documentation and code repositories.

Understanding Tokens in Web Scraping Context

In the context of web scraping, tokens represent pieces of text that the AI model processes. Here's how different content translates to tokens:

  • 1 token ≈ 4 characters of English text
  • 1 token ≈ 0.75 words on average
  • HTML markup typically requires more tokens due to tag structure
  • JSON output requires tokens for formatting and structure

Example Token Calculation

# A typical product page HTML snippet
html_content = """
<div class="product">
    <h1>Wireless Headphones</h1>
    <p class="price">$99.99</p>
    <p class="description">High-quality wireless headphones with noise cancellation</p>
</div>
"""

# Approximate token count: ~50-60 tokens
# Extracted JSON output might use: ~20-30 tokens

Managing Token Limits in Web Scraping

1. Pre-processing HTML Content

Before sending HTML to the Deepseek API, clean and reduce the content to stay within token limits:

import requests
from bs4 import BeautifulSoup
import openai

def clean_html_for_llm(html_content, max_chars=100000):
    """
    Clean HTML and extract only relevant content
    """
    soup = BeautifulSoup(html_content, 'html.parser')

    # Remove unnecessary elements
    for element in soup(['script', 'style', 'nav', 'footer', 'header', 'iframe']):
        element.decompose()

    # Get text content or simplified HTML
    cleaned_content = str(soup)

    # Truncate if necessary (rough estimate: 4 chars per token)
    if len(cleaned_content) > max_chars:
        cleaned_content = cleaned_content[:max_chars]

    return cleaned_content

def scrape_with_deepseek(url, extraction_prompt):
    """
    Scrape a URL using Deepseek API with token management
    """
    # Fetch the page
    response = requests.get(url)
    html_content = response.text

    # Clean HTML to reduce tokens (target: ~60K characters for V3 model)
    cleaned_html = clean_html_for_llm(html_content, max_chars=240000)

    # Configure Deepseek API client
    client = openai.OpenAI(
        api_key="your-deepseek-api-key",
        base_url="https://api.deepseek.com/v1"
    )

    # Make API request with token limits
    completion = client.chat.completions.create(
        model="deepseek-chat",
        messages=[
            {
                "role": "system",
                "content": "You are a web scraping assistant. Extract structured data from HTML."
            },
            {
                "role": "user",
                "content": f"{extraction_prompt}\n\nHTML Content:\n{cleaned_html}"
            }
        ],
        max_tokens=4096,  # Maximum output tokens
        temperature=0.1
    )

    return completion.choices[0].message.content

# Example usage
url = "https://example.com/products/wireless-headphones"
prompt = "Extract product name, price, description, and features as JSON"

result = scrape_with_deepseek(url, prompt)
print(result)

2. JavaScript Implementation with Token Management

const axios = require('axios');
const cheerio = require('cheerio');

async function cleanHtmlForLLM(htmlContent, maxChars = 100000) {
    const $ = cheerio.load(htmlContent);

    // Remove unnecessary elements
    $('script, style, nav, footer, header, iframe').remove();

    // Get cleaned HTML
    let cleaned = $.html();

    // Truncate if necessary
    if (cleaned.length > maxChars) {
        cleaned = cleaned.substring(0, maxChars);
    }

    return cleaned;
}

async function scrapeWithDeepseek(url, extractionPrompt) {
    try {
        // Fetch the page
        const response = await axios.get(url);
        const htmlContent = response.data;

        // Clean HTML to manage token limits (240K chars ≈ 60K tokens)
        const cleanedHtml = await cleanHtmlForLLM(htmlContent, 240000);

        // Call Deepseek API
        const apiResponse = await axios.post(
            'https://api.deepseek.com/v1/chat/completions',
            {
                model: 'deepseek-chat',
                messages: [
                    {
                        role: 'system',
                        content: 'You are a web scraping assistant. Extract structured data from HTML.'
                    },
                    {
                        role: 'user',
                        content: `${extractionPrompt}\n\nHTML Content:\n${cleanedHtml}`
                    }
                ],
                max_tokens: 4096,
                temperature: 0.1
            },
            {
                headers: {
                    'Authorization': `Bearer ${process.env.DEEPSEEK_API_KEY}`,
                    'Content-Type': 'application/json'
                }
            }
        );

        return apiResponse.data.choices[0].message.content;

    } catch (error) {
        if (error.response?.status === 400 &&
            error.response?.data?.error?.message?.includes('token')) {
            console.error('Token limit exceeded. Consider reducing input size.');
        }
        throw error;
    }
}

// Example usage
(async () => {
    const url = 'https://example.com/products/wireless-headphones';
    const prompt = 'Extract product name, price, description, and features as JSON';

    const result = await scrapeWithDeepseek(url, prompt);
    console.log(result);
})();

3. Chunking Large Documents

For very large web pages that exceed token limits even after cleaning, implement chunking:

def chunk_html_content(html_content, chunk_size=200000):
    """
    Split HTML content into manageable chunks
    """
    soup = BeautifulSoup(html_content, 'html.parser')

    # Find main content sections
    sections = soup.find_all(['article', 'section', 'div'], class_=lambda x: x and 'content' in x.lower())

    chunks = []
    current_chunk = ""

    for section in sections:
        section_html = str(section)

        if len(current_chunk) + len(section_html) > chunk_size:
            if current_chunk:
                chunks.append(current_chunk)
            current_chunk = section_html
        else:
            current_chunk += section_html

    if current_chunk:
        chunks.append(current_chunk)

    return chunks

def scrape_large_page(url, extraction_prompt):
    """
    Scrape a large page by processing it in chunks
    """
    response = requests.get(url)
    chunks = chunk_html_content(response.text)

    results = []
    for i, chunk in enumerate(chunks):
        print(f"Processing chunk {i+1}/{len(chunks)}")
        result = scrape_with_deepseek_chunk(chunk, extraction_prompt)
        results.append(result)

    return results

Token Limit Error Handling

Proper error handling is essential when working with token limits:

import json

def scrape_with_retry(url, prompt, max_retries=3):
    """
    Scrape with automatic retry and content reduction on token errors
    """
    max_chars = 240000  # Starting limit for Deepseek V3

    for attempt in range(max_retries):
        try:
            response = requests.get(url)
            cleaned_html = clean_html_for_llm(response.text, max_chars=max_chars)

            client = openai.OpenAI(
                api_key="your-deepseek-api-key",
                base_url="https://api.deepseek.com/v1"
            )

            completion = client.chat.completions.create(
                model="deepseek-chat",
                messages=[
                    {"role": "system", "content": "Extract data from HTML as JSON."},
                    {"role": "user", "content": f"{prompt}\n\n{cleaned_html}"}
                ],
                max_tokens=4096
            )

            return json.loads(completion.choices[0].message.content)

        except Exception as e:
            error_message = str(e)

            if 'token' in error_message.lower() or 'length' in error_message.lower():
                # Reduce content size by 30% and retry
                max_chars = int(max_chars * 0.7)
                print(f"Token limit exceeded. Retrying with reduced content: {max_chars} chars")
                continue
            else:
                raise e

    raise Exception(f"Failed after {max_retries} attempts")

Best Practices for Token Management

1. Monitor Token Usage

Track your token consumption to optimize costs and performance:

def track_token_usage(completion_response):
    """
    Extract and log token usage from API response
    """
    usage = completion_response.usage

    print(f"Prompt tokens: {usage.prompt_tokens}")
    print(f"Completion tokens: {usage.completion_tokens}")
    print(f"Total tokens: {usage.total_tokens}")

    # Calculate approximate cost (check current Deepseek pricing)
    cost_per_1k_input = 0.00014  # Example pricing
    cost_per_1k_output = 0.00028

    total_cost = (
        (usage.prompt_tokens / 1000 * cost_per_1k_input) +
        (usage.completion_tokens / 1000 * cost_per_1k_output)
    )

    print(f"Estimated cost: ${total_cost:.6f}")

    return usage

2. Use Selective HTML Extraction

Instead of sending entire HTML documents, extract only the relevant sections. When working with dynamic JavaScript-heavy websites, you might need to handle AJAX requests using tools like Puppeteer for handling AJAX requests before processing the content.

def extract_main_content(html_content, selectors):
    """
    Extract only specific sections from HTML
    """
    soup = BeautifulSoup(html_content, 'html.parser')

    main_content = ""
    for selector in selectors:
        elements = soup.select(selector)
        for element in elements:
            main_content += str(element)

    return main_content

# Example: Extract only product information
selectors = [
    '.product-info',
    '.product-description',
    '.product-reviews',
    '.pricing-section'
]

html_response = requests.get(url).text
relevant_html = extract_main_content(html_response, selectors)

3. Optimize Output Token Usage

Request concise outputs to maximize the content you can process:

# Efficient prompt engineering
prompt = """
Extract the following as compact JSON (no extra whitespace):
- product_name
- price (number only)
- rating (number only)
- in_stock (boolean)

Return ONLY the JSON object, no explanations.
"""

Comparing Deepseek Token Limits with Other LLMs

| Model | Context Window | Max Output | Best For Web Scraping | |-------|----------------|------------|----------------------| | Deepseek V3 | 64K tokens | 8K tokens | Large e-commerce pages, documentation | | Deepseek Chat | 32K tokens | 4K tokens | Standard product pages, articles | | GPT-4 Turbo | 128K tokens | 4K tokens | Very large documents | | Claude 3.5 Sonnet | 200K tokens | 8K tokens | Massive documents, entire websites |

Conclusion

Understanding and managing token limits is essential for successful web scraping with the Deepseek API. By implementing proper HTML cleaning, chunking strategies, and error handling, you can efficiently extract data from web pages while staying within token limits and optimizing costs.

Key takeaways: - Deepseek V3 supports up to 64K tokens context window - Pre-process HTML to remove unnecessary content before API calls - Implement chunking for pages that exceed token limits - Monitor token usage to optimize costs and performance - Use selective extraction to focus on relevant content only

For web scraping projects that require processing JavaScript-rendered content before sending it to LLMs, consider using browser automation tools to handle browser sessions and render dynamic content first.

By following these best practices, you can build robust and efficient web scraping applications powered by Deepseek AI models while staying within token limits and managing costs effectively.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon