Table of contents

How do I specify a charset when making a request with Requests?

When working with the Python requests library, charset handling occurs at two levels: specifying what charset you want to receive (for requests) and setting the charset for data you're sending (for responses). The charset is typically handled automatically, but you can override it when needed.

Understanding Charset in HTTP Requests

The charset determines how text is encoded in HTTP requests and responses. By default, requests automatically detects and handles charset based on server responses, but manual control is sometimes necessary for:

  • Servers that don't specify charset correctly
  • Sending data in specific encodings
  • Working with international content
  • Handling legacy systems with non-UTF-8 encodings

1. Requesting Specific Charset (GET Requests)

Use the Accept-Charset header to tell the server which charsets your client can handle:

import requests

# Request UTF-8 encoding from server
url = 'https://example.com/'
headers = {
    'Accept-Charset': 'utf-8'
}

response = requests.get(url, headers=headers)
print(f"Response encoding: {response.encoding}")
print(response.text)

You can also specify multiple acceptable charsets:

import requests

headers = {
    'Accept-Charset': 'utf-8, iso-8859-1;q=0.8, *;q=0.1'
}

response = requests.get('https://example.com/', headers=headers)

2. Setting Charset for POST Data

Form Data (application/x-www-form-urlencoded)

When sending form data, specify the charset in the Content-Type header:

import requests

url = 'https://httpbin.org/post'
headers = {
    'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8'
}

# Using string data
data = 'name=José&city=São Paulo'
response = requests.post(url, headers=headers, data=data.encode('utf-8'))

# Or using dictionary (requests handles encoding automatically)
form_data = {'name': 'José', 'city': 'São Paulo'}
response = requests.post(url, data=form_data)  # charset handled automatically

JSON Data

For JSON payloads, specify UTF-8 charset explicitly:

import requests
import json

url = 'https://httpbin.org/post'
headers = {
    'Content-Type': 'application/json; charset=utf-8'
}

data = {
    'name': 'José María',
    'description': 'Специальные символы',
    'emoji': '🌟'
}

# Method 1: Manual encoding
json_data = json.dumps(data, ensure_ascii=False).encode('utf-8')
response = requests.post(url, headers=headers, data=json_data)

# Method 2: Let requests handle it (recommended)
response = requests.post(url, json=data)  # Automatically sets charset=utf-8

Plain Text Data

For plain text content:

import requests

url = 'https://httpbin.org/post'
headers = {
    'Content-Type': 'text/plain; charset=utf-8'
}

text_data = "Hello, 世界! Привет мир!"
response = requests.post(url, headers=headers, data=text_data.encode('utf-8'))

3. Handling Response Charset

Automatic Detection

requests automatically detects charset from the Content-Type header:

import requests

response = requests.get('https://example.com/')
print(f"Detected encoding: {response.encoding}")
print(f"Apparent encoding: {response.apparent_encoding}")  # chardet-based detection

Manual Override

Override the detected encoding when servers provide incorrect charset information:

import requests

response = requests.get('https://example.com/')

# Check what was detected
print(f"Original encoding: {response.encoding}")

# Override if needed
response.encoding = 'utf-8'
content = response.text

# Or use apparent encoding (usually more accurate)
response.encoding = response.apparent_encoding
content = response.text

Binary Content

For binary data or when you want full control:

import requests

response = requests.get('https://example.com/')

# Get raw bytes
raw_bytes = response.content

# Decode manually
try:
    text = raw_bytes.decode('utf-8')
except UnicodeDecodeError:
    # Fallback to apparent encoding
    text = raw_bytes.decode(response.apparent_encoding)

4. Common Use Cases

Scraping Non-English Websites

import requests

# For Chinese websites
response = requests.get('https://example.cn/')
if response.encoding in ['ISO-8859-1', 'ascii']:
    # Server didn't specify encoding properly
    response.encoding = response.apparent_encoding

chinese_content = response.text

Sending International Form Data

import requests

url = 'https://example.com/submit'
form_data = {
    'name': 'François',
    'city': 'Москва',
    'comment': '这是中文评论'
}

# requests automatically handles Unicode in form data
response = requests.post(url, data=form_data)

Working with CSV Data

import requests
import csv
from io import StringIO

response = requests.get('https://example.com/data.csv')
response.encoding = 'utf-8'  # Ensure proper encoding

# Parse CSV with correct encoding
csv_data = csv.reader(StringIO(response.text))
for row in csv_data:
    print(row)

5. Best Practices

  1. Let requests handle it: Use json= parameter for JSON data instead of manual encoding
  2. Check apparent_encoding: Use response.apparent_encoding for better charset detection
  3. Handle errors: Always wrap encoding operations in try-catch blocks
  4. Test with international content: Verify your code works with non-ASCII characters
  5. Use UTF-8 by default: UTF-8 is the most widely supported encoding
import requests

def safe_request(url, **kwargs):
    """Make a request with proper charset handling"""
    response = requests.get(url, **kwargs)

    # Use apparent encoding if detection seems wrong
    if response.encoding in ['ISO-8859-1', 'ascii'] and response.apparent_encoding:
        response.encoding = response.apparent_encoding

    return response

# Usage
response = safe_request('https://international-site.com/')
print(response.text)

By understanding these charset handling techniques, you can ensure your web scraping and API integration code works correctly with international content and various server configurations.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon