Table of contents

How do I Set a User Agent String for Requests?

Setting a custom user agent string is a fundamental requirement for web scraping and API interactions. User agents identify your application to web servers and can significantly impact whether your requests are accepted or blocked. This guide covers how to set user agent strings across different programming languages and HTTP libraries.

What is a User Agent String?

A user agent string is an HTTP header that identifies the client making the request. It typically contains information about the browser, operating system, and application. Web servers use this information to serve appropriate content, implement rate limiting, or block automated requests.

Common user agent formats include: - Browser: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36 - Mobile: Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1 - Bot: Googlebot/2.1 (+http://www.google.com/bot.html)

Python with Requests Library

Basic User Agent Setting

The most straightforward way to set a user agent in Python requests is through the headers parameter:

import requests

# Set user agent in headers
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

response = requests.get('https://httpbin.org/headers', headers=headers)
print(response.json())

Using Sessions for Persistent User Agents

For multiple requests, use a session to maintain the same user agent:

import requests

session = requests.Session()
session.headers.update({
    'User-Agent': 'MyBot/1.0 (+https://example.com/bot)'
})

# All requests through this session will use the same user agent
response1 = session.get('https://api.example.com/data')
response2 = session.get('https://api.example.com/more-data')

Random User Agent Rotation

For web scraping, rotating user agents can help avoid detection:

import requests
import random

user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0'
]

def make_request(url):
    headers = {
        'User-Agent': random.choice(user_agents)
    }
    return requests.get(url, headers=headers)

response = make_request('https://example.com')

JavaScript with Fetch API

Browser Environment

In browsers, the fetch API doesn't allow direct user agent modification due to security restrictions, but you can set it in Node.js:

// This works in Node.js, not in browsers
const fetch = require('node-fetch');

const headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
};

fetch('https://httpbin.org/headers', { headers })
    .then(response => response.json())
    .then(data => console.log(data));

Node.js with Axios

Axios provides more flexibility for setting user agents:

const axios = require('axios');

// Method 1: Per request
const response = await axios.get('https://api.example.com/data', {
    headers: {
        'User-Agent': 'MyApp/1.0 (Node.js)'
    }
});

// Method 2: Default headers
axios.defaults.headers.common['User-Agent'] = 'MyApp/1.0 (Node.js)';

// Method 3: Create instance with default headers
const apiClient = axios.create({
    headers: {
        'User-Agent': 'MyApp/1.0 (Node.js)'
    }
});

Advanced User Agent Strategies

Mobile User Agents for Responsive Content

Many websites serve different content based on the user agent. To access mobile versions:

import requests

mobile_headers = {
    'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1'
}

response = requests.get('https://mobile-site.com', headers=mobile_headers)

Search Engine Bot User Agents

For SEO testing or accessing content meant for crawlers:

import requests

googlebot_headers = {
    'User-Agent': 'Googlebot/2.1 (+http://www.google.com/bot.html)'
}

response = requests.get('https://website.com', headers=googlebot_headers)

Custom Application User Agents

For API access or when identifying your application:

import requests

custom_headers = {
    'User-Agent': 'MyCompany-DataCollector/2.0 (contact@mycompany.com)'
}

response = requests.get('https://api.partner.com/data', headers=custom_headers)

Other Programming Languages

cURL Command Line

# Set user agent with cURL
curl -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" https://httpbin.org/headers

# Using the -A flag (shorthand)
curl -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" https://httpbin.org/headers

PHP with cURL

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://httpbin.org/headers');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$response = curl_exec($ch);
curl_close($ch);

echo $response;
?>

Go with net/http

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
)

func main() {
    client := &http.Client{}
    req, _ := http.NewRequest("GET", "https://httpbin.org/headers", nil)

    req.Header.Set("User-Agent", "MyGoApp/1.0")

    resp, err := client.Do(req)
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    body, _ := ioutil.ReadAll(resp.Body)
    fmt.Println(string(body))
}

Best Practices for User Agent Management

1. Use Realistic User Agents

Always use legitimate, realistic user agent strings. Avoid obviously fake or malformed user agents that might trigger security systems.

2. Respect robots.txt

When web scraping, always check and respect the website's robots.txt file, regardless of your user agent.

3. Implement Rate Limiting

Combine user agent rotation with proper rate limiting to avoid overwhelming servers:

import requests
import time
import random

def polite_request(url, headers=None, delay=(1, 3)):
    if headers is None:
        headers = {'User-Agent': 'Mozilla/5.0 (compatible; PoliteBot/1.0)'}

    # Random delay between requests
    time.sleep(random.uniform(*delay))

    return requests.get(url, headers=headers)

4. Monitor for Blocks

Implement monitoring to detect when your requests are being blocked:

import requests

def check_if_blocked(response):
    blocked_indicators = [
        'blocked', 'banned', 'access denied', 
        'suspicious activity', 'rate limited'
    ]

    if response.status_code in [403, 429, 503]:
        return True

    content = response.text.lower()
    return any(indicator in content for indicator in blocked_indicators)

Integration with Web Scraping Tools

When working with browser automation tools, user agent management becomes even more important. While this article focuses on HTTP requests, you might also need to consider how to handle browser sessions in Puppeteer or how to monitor network requests in Puppeteer for more complex scraping scenarios.

Debugging User Agent Issues

Verify Your User Agent

Use online tools or create a simple endpoint to verify your user agent is being sent correctly:

import requests

# Check what user agent is being sent
response = requests.get('https://httpbin.org/headers')
print("Sent headers:", response.json()['headers'])

Handle User Agent-Based Redirects

Some sites redirect based on user agents. Handle this appropriately:

import requests

session = requests.Session()
session.headers.update({
    'User-Agent': 'Mozilla/5.0 (compatible; CustomBot/1.0)'
})

# Allow redirects and track them
response = session.get('https://example.com', allow_redirects=True)
print(f"Final URL: {response.url}")
print(f"Redirect history: {[r.url for r in response.history]}")

Conclusion

Setting appropriate user agent strings is crucial for successful web scraping and API interactions. Whether you're using Python's requests library, JavaScript's fetch API, or other HTTP clients, the principle remains the same: identify your application appropriately while respecting server policies and rate limits.

Remember to always test your user agent configuration, monitor for blocks or unusual responses, and maintain ethical scraping practices. A well-configured user agent, combined with proper rate limiting and respect for robots.txt, forms the foundation of responsible web scraping.

For more complex scenarios involving browser automation, consider exploring advanced techniques for handling dynamic content and managing browser-based sessions alongside your HTTP request strategies.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon