Table of contents

How do I use proxies with the Requests library?

Using proxies with the Python Requests library is essential for web scraping projects that require IP rotation, geographic location changes, or bypassing rate limits. Proxies act as intermediaries between your application and target websites, masking your real IP address and providing additional anonymity.

Understanding Proxy Types

Before diving into implementation, it's important to understand the different types of proxies you can use with Requests:

  • HTTP Proxies: Handle HTTP and HTTPS traffic
  • SOCKS Proxies: More versatile, can handle any type of traffic
  • Transparent Proxies: Don't hide your IP address
  • Anonymous Proxies: Hide your IP but identify themselves as proxies
  • Elite Proxies: Provide complete anonymity

Basic Proxy Configuration

Single Proxy Setup

The simplest way to use a proxy with Requests is to pass it in the proxies parameter:

import requests

# HTTP proxy configuration
proxies = {
    'http': 'http://proxy-server:port',
    'https': 'http://proxy-server:port'
}

response = requests.get('https://httpbin.org/ip', proxies=proxies)
print(response.json())

HTTPS Proxy with Different Endpoints

You can specify different proxies for HTTP and HTTPS traffic:

import requests

proxies = {
    'http': 'http://http-proxy:8080',
    'https': 'https://https-proxy:8443'
}

# This will use the HTTP proxy
response = requests.get('http://httpbin.org/ip', proxies=proxies)

# This will use the HTTPS proxy
response = requests.get('https://httpbin.org/ip', proxies=proxies)

SOCKS Proxy Configuration

SOCKS proxies require additional dependencies. Install the requests[socks] package:

pip install requests[socks]
import requests

# SOCKS4 proxy
proxies = {
    'http': 'socks4://proxy-server:1080',
    'https': 'socks4://proxy-server:1080'
}

# SOCKS5 proxy
proxies = {
    'http': 'socks5://proxy-server:1080',
    'https': 'socks5://proxy-server:1080'
}

response = requests.get('https://httpbin.org/ip', proxies=proxies)

Proxy Authentication

Many proxy services require authentication. Here's how to handle username and password authentication:

import requests

# Method 1: Include credentials in the URL
proxies = {
    'http': 'http://username:password@proxy-server:8080',
    'https': 'http://username:password@proxy-server:8080'
}

# Method 2: Using HTTPProxyAuth (for more complex authentication)
from requests.auth import HTTPProxyAuth

proxies = {
    'http': 'http://proxy-server:8080',
    'https': 'http://proxy-server:8080'
}

auth = HTTPProxyAuth('username', 'password')
response = requests.get('https://httpbin.org/ip', proxies=proxies, auth=auth)

Session-Based Proxy Configuration

For multiple requests, it's more efficient to use a session with proxy configuration:

import requests

session = requests.Session()
session.proxies = {
    'http': 'http://username:password@proxy-server:8080',
    'https': 'http://username:password@proxy-server:8080'
}

# All requests through this session will use the proxy
response1 = session.get('https://httpbin.org/ip')
response2 = session.get('https://httpbin.org/user-agent')

Proxy Rotation Implementation

For large-scale web scraping, you'll want to rotate between multiple proxies:

import requests
import random
import time

class ProxyRotator:
    def __init__(self, proxy_list):
        self.proxy_list = proxy_list
        self.current_proxy = None

    def get_random_proxy(self):
        return random.choice(self.proxy_list)

    def make_request(self, url, max_retries=3):
        for attempt in range(max_retries):
            try:
                proxy = self.get_random_proxy()
                proxies = {
                    'http': proxy,
                    'https': proxy
                }

                response = requests.get(
                    url, 
                    proxies=proxies, 
                    timeout=10,
                    headers={'User-Agent': 'Mozilla/5.0 (compatible; Bot/1.0)'}
                )

                if response.status_code == 200:
                    return response

            except requests.exceptions.RequestException as e:
                print(f"Attempt {attempt + 1} failed with proxy {proxy}: {e}")
                if attempt < max_retries - 1:
                    time.sleep(2)  # Wait before retry

        raise Exception("All proxy attempts failed")

# Usage
proxy_list = [
    'http://user1:pass1@proxy1:8080',
    'http://user2:pass2@proxy2:8080',
    'http://user3:pass3@proxy3:8080'
]

rotator = ProxyRotator(proxy_list)
response = rotator.make_request('https://httpbin.org/ip')
print(response.json())

Environment Variables for Proxy Configuration

You can also configure proxies using environment variables:

export HTTP_PROXY=http://proxy-server:8080
export HTTPS_PROXY=http://proxy-server:8080
export NO_PROXY=localhost,127.0.0.1
import requests
import os

# Requests automatically uses environment variables
response = requests.get('https://httpbin.org/ip')

# Or explicitly disable environment proxy settings
response = requests.get('https://httpbin.org/ip', proxies={})

Advanced Proxy Configuration

Custom Proxy Adapter

For more control over proxy behavior, you can create a custom adapter:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

class ProxyAdapter(HTTPAdapter):
    def __init__(self, proxy_url, *args, **kwargs):
        self.proxy_url = proxy_url
        super().__init__(*args, **kwargs)

    def send(self, request, **kwargs):
        kwargs['proxies'] = {
            'http': self.proxy_url,
            'https': self.proxy_url
        }
        return super().send(request, **kwargs)

# Usage
session = requests.Session()
adapter = ProxyAdapter('http://proxy-server:8080')
session.mount('http://', adapter)
session.mount('https://', adapter)

response = session.get('https://httpbin.org/ip')

Proxy Health Checking

Implement proxy health checking to ensure your proxies are working:

import requests
import concurrent.futures

def check_proxy(proxy_url, timeout=10):
    """Check if a proxy is working"""
    try:
        proxies = {
            'http': proxy_url,
            'https': proxy_url
        }

        response = requests.get(
            'https://httpbin.org/ip',
            proxies=proxies,
            timeout=timeout
        )

        if response.status_code == 200:
            return {'proxy': proxy_url, 'status': 'working', 'ip': response.json()['origin']}
        else:
            return {'proxy': proxy_url, 'status': 'failed', 'error': f'Status code: {response.status_code}'}

    except Exception as e:
        return {'proxy': proxy_url, 'status': 'failed', 'error': str(e)}

def check_proxies_concurrent(proxy_list, max_workers=10):
    """Check multiple proxies concurrently"""
    working_proxies = []

    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_proxy = {executor.submit(check_proxy, proxy): proxy for proxy in proxy_list}

        for future in concurrent.futures.as_completed(future_to_proxy):
            result = future.result()
            if result['status'] == 'working':
                working_proxies.append(result)
            else:
                print(f"Proxy {result['proxy']} failed: {result['error']}")

    return working_proxies

# Usage
proxy_list = [
    'http://proxy1:8080',
    'http://proxy2:8080',
    'http://proxy3:8080'
]

working_proxies = check_proxies_concurrent(proxy_list)
print(f"Found {len(working_proxies)} working proxies")

Error Handling and Best Practices

Common Proxy Errors

import requests
from requests.exceptions import ProxyError, ConnectTimeout, ConnectionError

def safe_proxy_request(url, proxies, max_retries=3):
    """Make a request with proper error handling"""
    for attempt in range(max_retries):
        try:
            response = requests.get(
                url,
                proxies=proxies,
                timeout=(10, 30),  # (connect timeout, read timeout)
                headers={
                    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
                }
            )
            return response

        except ProxyError as e:
            print(f"Proxy error on attempt {attempt + 1}: {e}")
        except ConnectTimeout as e:
            print(f"Connection timeout on attempt {attempt + 1}: {e}")
        except ConnectionError as e:
            print(f"Connection error on attempt {attempt + 1}: {e}")
        except Exception as e:
            print(f"Unexpected error on attempt {attempt + 1}: {e}")

        if attempt < max_retries - 1:
            time.sleep(2 ** attempt)  # Exponential backoff

    raise Exception("All attempts failed")

Best Practices

  1. Always use timeouts to prevent hanging requests
  2. Implement retry logic for failed proxy connections
  3. Rotate proxies to avoid rate limiting
  4. Monitor proxy health regularly
  5. Use appropriate headers to appear more legitimate
  6. Respect robots.txt and website terms of service

Integration with Web Scraping Workflows

When building larger web scraping applications, you might want to integrate proxy functionality with other tools. For complex scenarios involving JavaScript-heavy websites, consider combining proxy usage with browser automation tools for comprehensive web scraping solutions.

Working with Sessions for Better Performance

Sessions are particularly important when using proxies, as they maintain connection pools and preserve cookies across multiple requests. This is similar to how you might handle browser sessions in web automation, but at the HTTP level:

import requests

# Create a session with persistent proxy configuration
session = requests.Session()
session.proxies.update({
    'http': 'http://proxy-server:8080',
    'https': 'http://proxy-server:8080'
})

# Set persistent headers
session.headers.update({
    'User-Agent': 'Mozilla/5.0 (compatible; WebScraper/1.0)',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
})

# Make multiple requests using the same session
for url in url_list:
    response = session.get(url)
    process_response(response)

Testing Proxy Configuration

Always test your proxy configuration before deploying:

import requests

def test_proxy_configuration():
    """Test proxy setup"""
    proxies = {
        'http': 'http://your-proxy:8080',
        'https': 'http://your-proxy:8080'
    }

    try:
        # Test IP address
        response = requests.get('https://httpbin.org/ip', proxies=proxies, timeout=10)
        print(f"Your IP through proxy: {response.json()['origin']}")

        # Test headers
        response = requests.get('https://httpbin.org/headers', proxies=proxies, timeout=10)
        print(f"Headers sent: {response.json()['headers']}")

        # Test different protocols
        response = requests.get('http://httpbin.org/ip', proxies=proxies, timeout=10)
        print(f"HTTP request successful: {response.status_code}")

        return True

    except Exception as e:
        print(f"Proxy test failed: {e}")
        return False

# Run the test
if test_proxy_configuration():
    print("Proxy configuration is working correctly!")
else:
    print("Proxy configuration needs adjustment.")

Handling Anti-Bot Measures

When using proxies for web scraping, you may encounter various anti-bot measures. While proxies help mask your IP address, you should also consider other detection vectors:

import requests
import random
import time

def create_realistic_headers():
    """Generate realistic browser headers"""
    user_agents = [
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
        'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
        'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    ]

    return {
        'User-Agent': random.choice(user_agents),
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.5',
        'Accept-Encoding': 'gzip, deflate',
        'DNT': '1',
        'Connection': 'keep-alive'
    }

def scrape_with_stealth(url, proxies):
    """Scrape with anti-detection measures"""
    headers = create_realistic_headers()

    # Add random delay
    time.sleep(random.uniform(1, 3))

    response = requests.get(
        url,
        proxies=proxies,
        headers=headers,
        timeout=15
    )

    return response

Conclusion

Using proxies with the Requests library is crucial for professional web scraping operations. By implementing proper proxy rotation, authentication, and error handling, you can build robust and scalable scraping solutions. Remember to always respect website terms of service and implement appropriate delays between requests to avoid overwhelming target servers.

For production environments, consider using proxy services that provide high-quality, rotating IP addresses with good geographic distribution and reliable uptime. This approach, combined with proper implementation techniques, will ensure your web scraping projects run smoothly and efficiently.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon