How do you handle API responses with different content types?
When working with APIs, you'll encounter various response content types beyond the standard JSON format. Modern APIs can return XML, HTML, plain text, binary data, and even mixed content types. Understanding how to properly handle these different formats is crucial for building robust web scraping and API integration applications.
Understanding Content Types
The HTTP Content-Type
header indicates the media type of the response body. Common content types include:
application/json
- JSON dataapplication/xml
ortext/xml
- XML documentstext/html
- HTML contenttext/plain
- Plain textapplication/octet-stream
- Binary dataimage/jpeg
,image/png
- Image filesapplication/pdf
- PDF documents
Detecting Content Types
Before processing a response, you should check its content type to determine the appropriate parsing strategy.
Python Example
import requests
import json
import xml.etree.ElementTree as ET
from bs4 import BeautifulSoup
def handle_api_response(url):
response = requests.get(url)
content_type = response.headers.get('content-type', '').lower()
if 'application/json' in content_type:
return handle_json_response(response)
elif 'application/xml' in content_type or 'text/xml' in content_type:
return handle_xml_response(response)
elif 'text/html' in content_type:
return handle_html_response(response)
elif 'text/plain' in content_type:
return handle_text_response(response)
elif 'application/octet-stream' in content_type:
return handle_binary_response(response)
else:
return handle_unknown_response(response)
def handle_json_response(response):
try:
return response.json()
except json.JSONDecodeError as e:
print(f"Failed to parse JSON: {e}")
return None
def handle_xml_response(response):
try:
root = ET.fromstring(response.content)
return root
except ET.ParseError as e:
print(f"Failed to parse XML: {e}")
return None
def handle_html_response(response):
soup = BeautifulSoup(response.content, 'html.parser')
return soup
def handle_text_response(response):
return response.text
def handle_binary_response(response):
return response.content
def handle_unknown_response(response):
print(f"Unknown content type: {response.headers.get('content-type')}")
return response.content
JavaScript Example
async function handleApiResponse(url) {
try {
const response = await fetch(url);
const contentType = response.headers.get('content-type')?.toLowerCase() || '';
if (contentType.includes('application/json')) {
return await handleJsonResponse(response);
} else if (contentType.includes('application/xml') || contentType.includes('text/xml')) {
return await handleXmlResponse(response);
} else if (contentType.includes('text/html')) {
return await handleHtmlResponse(response);
} else if (contentType.includes('text/plain')) {
return await handleTextResponse(response);
} else if (contentType.includes('application/octet-stream')) {
return await handleBinaryResponse(response);
} else {
return await handleUnknownResponse(response);
}
} catch (error) {
console.error('Request failed:', error);
return null;
}
}
async function handleJsonResponse(response) {
try {
return await response.json();
} catch (error) {
console.error('Failed to parse JSON:', error);
return null;
}
}
async function handleXmlResponse(response) {
try {
const text = await response.text();
const parser = new DOMParser();
return parser.parseFromString(text, 'text/xml');
} catch (error) {
console.error('Failed to parse XML:', error);
return null;
}
}
async function handleHtmlResponse(response) {
try {
const text = await response.text();
const parser = new DOMParser();
return parser.parseFromString(text, 'text/html');
} catch (error) {
console.error('Failed to parse HTML:', error);
return null;
}
}
async function handleTextResponse(response) {
return await response.text();
}
async function handleBinaryResponse(response) {
return await response.arrayBuffer();
}
async function handleUnknownResponse(response) {
console.warn('Unknown content type:', response.headers.get('content-type'));
return await response.blob();
}
Handling Specific Content Types
JSON Responses
JSON is the most common API response format. Always include error handling for malformed JSON:
# Python
def safe_json_parse(response):
try:
data = response.json()
return data
except json.JSONDecodeError:
# Fallback: try to clean the response
text = response.text.strip()
if text.startswith('(') and text.endswith(')'):
# Handle JSONP responses
text = text[1:-1]
return json.loads(text)
// JavaScript
async function safeJsonParse(response) {
try {
return await response.json();
} catch (error) {
// Fallback: try to parse manually cleaned text
const text = await response.text();
const cleanText = text.trim();
if (cleanText.startsWith('(') && cleanText.endsWith(')')) {
// Handle JSONP responses
return JSON.parse(cleanText.slice(1, -1));
}
throw error;
}
}
XML Responses
XML parsing requires different libraries and approaches:
# Python with xml.etree.ElementTree
import xml.etree.ElementTree as ET
def parse_xml_response(response):
try:
root = ET.fromstring(response.content)
# Extract data from XML
data = {}
for child in root:
data[child.tag] = child.text
return data
except ET.ParseError as e:
print(f"XML parsing error: {e}")
return None
# Python with lxml for more advanced parsing
from lxml import etree
def parse_xml_with_lxml(response):
try:
root = etree.fromstring(response.content)
# Use XPath to extract specific elements
titles = root.xpath('//title/text()')
return {'titles': titles}
except etree.XMLSyntaxError as e:
print(f"XML syntax error: {e}")
return None
HTML Responses
When APIs return HTML content, you'll need to parse and extract relevant data. This is particularly useful when handling AJAX requests using Puppeteer or processing web pages:
# Python with BeautifulSoup
from bs4 import BeautifulSoup
def parse_html_response(response):
soup = BeautifulSoup(response.content, 'html.parser')
# Extract specific elements
data = {
'title': soup.find('title').text if soup.find('title') else None,
'links': [a['href'] for a in soup.find_all('a', href=True)],
'images': [img['src'] for img in soup.find_all('img', src=True)]
}
return data
Binary Data Handling
For file downloads, images, or other binary content:
# Python
def download_binary_file(url, filename):
response = requests.get(url, stream=True)
if response.headers.get('content-type', '').startswith('image/'):
with open(filename, 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
return True
return False
# Check file size before downloading
def safe_binary_download(url, max_size_mb=10):
response = requests.head(url)
content_length = int(response.headers.get('content-length', 0))
if content_length > max_size_mb * 1024 * 1024:
raise ValueError(f"File too large: {content_length} bytes")
return requests.get(url)
Advanced Content Type Handling
Content Negotiation
Some APIs support content negotiation, allowing you to request specific formats:
# Request JSON explicitly
headers = {'Accept': 'application/json'}
response = requests.get(url, headers=headers)
# Request XML
headers = {'Accept': 'application/xml'}
response = requests.get(url, headers=headers)
# Request multiple formats with preference
headers = {'Accept': 'application/json, application/xml;q=0.8, text/html;q=0.6'}
response = requests.get(url, headers=headers)
Handling Charset Encoding
Content type headers often include charset information:
import chardet
def get_text_with_encoding(response):
content_type = response.headers.get('content-type', '')
# Check if charset is specified
if 'charset=' in content_type:
charset = content_type.split('charset=')[1].split(';')[0]
return response.content.decode(charset)
# Auto-detect encoding
detected = chardet.detect(response.content)
encoding = detected['encoding'] or 'utf-8'
return response.content.decode(encoding, errors='replace')
Error Handling Best Practices
Implement comprehensive error handling for different content types:
class ContentTypeHandler:
def __init__(self):
self.handlers = {
'application/json': self._handle_json,
'application/xml': self._handle_xml,
'text/xml': self._handle_xml,
'text/html': self._handle_html,
'text/plain': self._handle_text,
}
def process_response(self, response):
content_type = response.headers.get('content-type', '').split(';')[0]
handler = self.handlers.get(content_type, self._handle_default)
try:
return handler(response)
except Exception as e:
return {
'error': f'Failed to process {content_type}: {str(e)}',
'content_type': content_type,
'status_code': response.status_code
}
def _handle_json(self, response):
return response.json()
def _handle_xml(self, response):
import xml.etree.ElementTree as ET
return ET.fromstring(response.content)
def _handle_html(self, response):
from bs4 import BeautifulSoup
return BeautifulSoup(response.content, 'html.parser')
def _handle_text(self, response):
return response.text
def _handle_default(self, response):
return {
'content': response.content,
'encoding': response.encoding,
'content_type': response.headers.get('content-type')
}
Testing Different Content Types
When building applications that handle multiple content types, create comprehensive tests:
import unittest
from unittest.mock import Mock, patch
class TestContentTypeHandling(unittest.TestCase):
def test_json_response(self):
mock_response = Mock()
mock_response.headers = {'content-type': 'application/json'}
mock_response.json.return_value = {'key': 'value'}
result = handle_api_response_mock(mock_response)
self.assertEqual(result['key'], 'value')
def test_xml_response(self):
mock_response = Mock()
mock_response.headers = {'content-type': 'application/xml'}
mock_response.content = b'<root><item>test</item></root>'
result = handle_api_response_mock(mock_response)
self.assertIsNotNone(result)
def test_unknown_content_type(self):
mock_response = Mock()
mock_response.headers = {'content-type': 'application/unknown'}
mock_response.content = b'unknown data'
result = handle_api_response_mock(mock_response)
self.assertEqual(result, b'unknown data')
Console Commands and Tools
Use command-line tools to test API responses:
# Check content type with curl
curl -I https://api.example.com/data
# Request specific content type
curl -H "Accept: application/json" https://api.example.com/data
curl -H "Accept: application/xml" https://api.example.com/data
# Save binary response to file
curl -o image.jpg https://api.example.com/image
# Display response headers and content type
curl -v https://api.example.com/data 2>&1 | grep -i content-type
Conclusion
Handling different API response content types requires a flexible approach that can adapt to various formats while maintaining robust error handling. When working with complex web applications, you might also need to consider monitoring network requests in Puppeteer to understand the full picture of API interactions. By implementing proper content type detection, format-specific parsing, and comprehensive error handling, you can build applications that reliably process diverse API responses and provide a better user experience.
Remember to always validate content types, implement fallback mechanisms for parsing errors, and test your handlers with various response formats to ensure reliability across different API endpoints and scenarios.