How do I Set a User Agent String for Requests?
Setting a custom user agent string is a fundamental requirement for web scraping and API interactions. User agents identify your application to web servers and can significantly impact whether your requests are accepted or blocked. This guide covers how to set user agent strings across different programming languages and HTTP libraries.
What is a User Agent String?
A user agent string is an HTTP header that identifies the client making the request. It typically contains information about the browser, operating system, and application. Web servers use this information to serve appropriate content, implement rate limiting, or block automated requests.
Common user agent formats include:
- Browser: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36
- Mobile: Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1
- Bot: Googlebot/2.1 (+http://www.google.com/bot.html)
Python with Requests Library
Basic User Agent Setting
The most straightforward way to set a user agent in Python requests is through the headers parameter:
import requests
# Set user agent in headers
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
response = requests.get('https://httpbin.org/headers', headers=headers)
print(response.json())
Using Sessions for Persistent User Agents
For multiple requests, use a session to maintain the same user agent:
import requests
session = requests.Session()
session.headers.update({
'User-Agent': 'MyBot/1.0 (+https://example.com/bot)'
})
# All requests through this session will use the same user agent
response1 = session.get('https://api.example.com/data')
response2 = session.get('https://api.example.com/more-data')
Random User Agent Rotation
For web scraping, rotating user agents can help avoid detection:
import requests
import random
user_agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0'
]
def make_request(url):
headers = {
'User-Agent': random.choice(user_agents)
}
return requests.get(url, headers=headers)
response = make_request('https://example.com')
JavaScript with Fetch API
Browser Environment
In browsers, the fetch API doesn't allow direct user agent modification due to security restrictions, but you can set it in Node.js:
// This works in Node.js, not in browsers
const fetch = require('node-fetch');
const headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
};
fetch('https://httpbin.org/headers', { headers })
.then(response => response.json())
.then(data => console.log(data));
Node.js with Axios
Axios provides more flexibility for setting user agents:
const axios = require('axios');
// Method 1: Per request
const response = await axios.get('https://api.example.com/data', {
headers: {
'User-Agent': 'MyApp/1.0 (Node.js)'
}
});
// Method 2: Default headers
axios.defaults.headers.common['User-Agent'] = 'MyApp/1.0 (Node.js)';
// Method 3: Create instance with default headers
const apiClient = axios.create({
headers: {
'User-Agent': 'MyApp/1.0 (Node.js)'
}
});
Advanced User Agent Strategies
Mobile User Agents for Responsive Content
Many websites serve different content based on the user agent. To access mobile versions:
import requests
mobile_headers = {
'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1'
}
response = requests.get('https://mobile-site.com', headers=mobile_headers)
Search Engine Bot User Agents
For SEO testing or accessing content meant for crawlers:
import requests
googlebot_headers = {
'User-Agent': 'Googlebot/2.1 (+http://www.google.com/bot.html)'
}
response = requests.get('https://website.com', headers=googlebot_headers)
Custom Application User Agents
For API access or when identifying your application:
import requests
custom_headers = {
'User-Agent': 'MyCompany-DataCollector/2.0 (contact@mycompany.com)'
}
response = requests.get('https://api.partner.com/data', headers=custom_headers)
Other Programming Languages
cURL Command Line
# Set user agent with cURL
curl -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" https://httpbin.org/headers
# Using the -A flag (shorthand)
curl -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" https://httpbin.org/headers
PHP with cURL
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://httpbin.org/headers');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
curl_close($ch);
echo $response;
?>
Go with net/http
package main
import (
"fmt"
"io/ioutil"
"net/http"
)
func main() {
client := &http.Client{}
req, _ := http.NewRequest("GET", "https://httpbin.org/headers", nil)
req.Header.Set("User-Agent", "MyGoApp/1.0")
resp, err := client.Do(req)
if err != nil {
panic(err)
}
defer resp.Body.Close()
body, _ := ioutil.ReadAll(resp.Body)
fmt.Println(string(body))
}
Best Practices for User Agent Management
1. Use Realistic User Agents
Always use legitimate, realistic user agent strings. Avoid obviously fake or malformed user agents that might trigger security systems.
2. Respect robots.txt
When web scraping, always check and respect the website's robots.txt file, regardless of your user agent.
3. Implement Rate Limiting
Combine user agent rotation with proper rate limiting to avoid overwhelming servers:
import requests
import time
import random
def polite_request(url, headers=None, delay=(1, 3)):
if headers is None:
headers = {'User-Agent': 'Mozilla/5.0 (compatible; PoliteBot/1.0)'}
# Random delay between requests
time.sleep(random.uniform(*delay))
return requests.get(url, headers=headers)
4. Monitor for Blocks
Implement monitoring to detect when your requests are being blocked:
import requests
def check_if_blocked(response):
blocked_indicators = [
'blocked', 'banned', 'access denied',
'suspicious activity', 'rate limited'
]
if response.status_code in [403, 429, 503]:
return True
content = response.text.lower()
return any(indicator in content for indicator in blocked_indicators)
Integration with Web Scraping Tools
When working with browser automation tools, user agent management becomes even more important. While this article focuses on HTTP requests, you might also need to consider how to handle browser sessions in Puppeteer or how to monitor network requests in Puppeteer for more complex scraping scenarios.
Debugging User Agent Issues
Verify Your User Agent
Use online tools or create a simple endpoint to verify your user agent is being sent correctly:
import requests
# Check what user agent is being sent
response = requests.get('https://httpbin.org/headers')
print("Sent headers:", response.json()['headers'])
Handle User Agent-Based Redirects
Some sites redirect based on user agents. Handle this appropriately:
import requests
session = requests.Session()
session.headers.update({
'User-Agent': 'Mozilla/5.0 (compatible; CustomBot/1.0)'
})
# Allow redirects and track them
response = session.get('https://example.com', allow_redirects=True)
print(f"Final URL: {response.url}")
print(f"Redirect history: {[r.url for r in response.history]}")
Conclusion
Setting appropriate user agent strings is crucial for successful web scraping and API interactions. Whether you're using Python's requests library, JavaScript's fetch API, or other HTTP clients, the principle remains the same: identify your application appropriately while respecting server policies and rate limits.
Remember to always test your user agent configuration, monitor for blocks or unusual responses, and maintain ethical scraping practices. A well-configured user agent, combined with proper rate limiting and respect for robots.txt, forms the foundation of responsible web scraping.
For more complex scenarios involving browser automation, consider exploring advanced techniques for handling dynamic content and managing browser-based sessions alongside your HTTP request strategies.