Table of contents

How do I make asynchronous requests with Requests?

The Python Requests library is synchronous by default, meaning it blocks execution until each request completes. However, you can achieve asynchronous behavior using several approaches to dramatically improve performance when making multiple HTTP requests. This guide covers various methods to implement asynchronous requests for faster web scraping and API interactions.

Understanding Synchronous vs Asynchronous Requests

Synchronous requests process one at a time: - Each request waits for the previous one to complete - Total time = sum of all individual request times - Simple but inefficient for multiple requests

Asynchronous requests process concurrently: - Multiple requests can run simultaneously - Total time ≈ time of the slowest request - More complex but significantly faster

Method 1: Using ThreadPoolExecutor with Requests

The concurrent.futures.ThreadPoolExecutor allows you to run multiple synchronous requests in separate threads:

import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
import time

def fetch_url(url):
    """Fetch a single URL and return response data"""
    try:
        response = requests.get(url, timeout=10)
        return {
            'url': url,
            'status_code': response.status_code,
            'content_length': len(response.content),
            'response_time': response.elapsed.total_seconds()
        }
    except requests.RequestException as e:
        return {
            'url': url,
            'error': str(e)
        }

def fetch_urls_async(urls, max_workers=5):
    """Fetch multiple URLs asynchronously using ThreadPoolExecutor"""
    results = []

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        # Submit all requests
        future_to_url = {executor.submit(fetch_url, url): url for url in urls}

        # Collect results as they complete
        for future in as_completed(future_to_url):
            result = future.result()
            results.append(result)
            print(f"Completed: {result.get('url', 'Unknown')}")

    return results

# Example usage
urls = [
    'https://httpbin.org/delay/1',
    'https://httpbin.org/delay/2',
    'https://httpbin.org/delay/3',
    'https://jsonplaceholder.typicode.com/posts/1',
    'https://jsonplaceholder.typicode.com/posts/2'
]

start_time = time.time()
results = fetch_urls_async(urls, max_workers=3)
end_time = time.time()

print(f"Total time: {end_time - start_time:.2f} seconds")
print(f"Successful requests: {len([r for r in results if 'error' not in r])}")

Method 2: Using asyncio with aiohttp

For true asynchronous requests, use aiohttp instead of requests:

import asyncio
import aiohttp
import time

async def fetch_url_async(session, url):
    """Fetch a single URL asynchronously"""
    try:
        async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as response:
            content = await response.text()
            return {
                'url': url,
                'status_code': response.status,
                'content_length': len(content),
                'headers': dict(response.headers)
            }
    except Exception as e:
        return {
            'url': url,
            'error': str(e)
        }

async def fetch_urls_aiohttp(urls, max_concurrent=5):
    """Fetch multiple URLs using aiohttp with concurrency control"""
    semaphore = asyncio.Semaphore(max_concurrent)

    async def fetch_with_semaphore(session, url):
        async with semaphore:
            return await fetch_url_async(session, url)

    async with aiohttp.ClientSession() as session:
        tasks = [fetch_with_semaphore(session, url) for url in urls]
        results = await asyncio.gather(*tasks)
        return results

# Example usage
async def main():
    urls = [
        'https://httpbin.org/delay/1',
        'https://httpbin.org/delay/2',
        'https://httpbin.org/json',
        'https://jsonplaceholder.typicode.com/posts/1',
        'https://jsonplaceholder.typicode.com/users/1'
    ]

    start_time = time.time()
    results = await fetch_urls_aiohttp(urls, max_concurrent=3)
    end_time = time.time()

    print(f"Total time: {end_time - start_time:.2f} seconds")
    for result in results:
        if 'error' not in result:
            print(f"✓ {result['url']} - Status: {result['status_code']}")
        else:
            print(f"✗ {result['url']} - Error: {result['error']}")

# Run the async function
asyncio.run(main())

Method 3: Using requests-futures

The requests-futures library provides a simple wrapper around requests:

pip install requests-futures
from requests_futures.sessions import FuturesSession
import time

def fetch_urls_with_futures(urls, max_workers=5):
    """Fetch URLs using requests-futures"""
    session = FuturesSession(max_workers=max_workers)

    # Submit all requests
    futures = []
    for url in urls:
        future = session.get(url)
        futures.append((url, future))

    # Collect results
    results = []
    for url, future in futures:
        try:
            response = future.result(timeout=10)
            results.append({
                'url': url,
                'status_code': response.status_code,
                'content_length': len(response.content)
            })
        except Exception as e:
            results.append({
                'url': url,
                'error': str(e)
            })

    return results

# Example usage
urls = [
    'https://httpbin.org/json',
    'https://jsonplaceholder.typicode.com/posts/1',
    'https://jsonplaceholder.typicode.com/users/1'
]

start_time = time.time()
results = fetch_urls_with_futures(urls)
end_time = time.time()

print(f"Total time: {end_time - start_time:.2f} seconds")

Advanced Patterns for Web Scraping

Rate-Limited Async Requests

When scraping websites, implement rate limiting to avoid overwhelming servers:

import asyncio
import aiohttp
from asyncio import Semaphore

class RateLimitedScraper:
    def __init__(self, rate_limit=2, max_concurrent=5):
        self.rate_limit = rate_limit  # requests per second
        self.max_concurrent = max_concurrent
        self.semaphore = Semaphore(max_concurrent)
        self.last_request_time = 0

    async def _rate_limit(self):
        """Implement rate limiting"""
        current_time = asyncio.get_event_loop().time()
        time_since_last = current_time - self.last_request_time
        min_interval = 1.0 / self.rate_limit

        if time_since_last < min_interval:
            await asyncio.sleep(min_interval - time_since_last)

        self.last_request_time = asyncio.get_event_loop().time()

    async def fetch_url(self, session, url):
        """Fetch URL with rate limiting and concurrency control"""
        async with self.semaphore:
            await self._rate_limit()

            try:
                async with session.get(url) as response:
                    content = await response.text()
                    return {
                        'url': url,
                        'status_code': response.status,
                        'content': content[:200] + '...' if len(content) > 200 else content
                    }
            except Exception as e:
                return {'url': url, 'error': str(e)}

    async def scrape_urls(self, urls):
        """Scrape multiple URLs with rate limiting"""
        async with aiohttp.ClientSession() as session:
            tasks = [self.fetch_url(session, url) for url in urls]
            return await asyncio.gather(*tasks)

# Usage example
async def scrape_example():
    scraper = RateLimitedScraper(rate_limit=1, max_concurrent=3)

    urls = [
        'https://httpbin.org/delay/1',
        'https://httpbin.org/json',
        'https://httpbin.org/user-agent'
    ]

    results = await scraper.scrape_urls(urls)

    for result in results:
        print(f"URL: {result['url']}")
        if 'error' not in result:
            print(f"Status: {result['status_code']}")
        else:
            print(f"Error: {result['error']}")
        print("-" * 50)

asyncio.run(scrape_example())

Session Management for Authentication

Maintain sessions across multiple async requests:

import asyncio
import aiohttp

class AsyncSessionScraper:
    def __init__(self):
        self.session = None

    async def __aenter__(self):
        self.session = aiohttp.ClientSession()
        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb):
        await self.session.close()

    async def login(self, login_url, username, password):
        """Perform login and maintain session"""
        login_data = {
            'username': username,
            'password': password
        }

        async with self.session.post(login_url, data=login_data) as response:
            return response.status == 200

    async def fetch_protected_urls(self, urls):
        """Fetch URLs that require authentication"""
        tasks = []
        for url in urls:
            task = self.session.get(url)
            tasks.append(task)

        responses = await asyncio.gather(*tasks)
        results = []

        for response in responses:
            content = await response.text()
            results.append({
                'url': str(response.url),
                'status_code': response.status,
                'content_length': len(content)
            })
            response.close()

        return results

# Usage
async def scrape_with_auth():
    async with AsyncSessionScraper() as scraper:
        # Login first
        success = await scraper.login(
            'https://httpbin.org/post',
            'username',
            'password'
        )

        if success:
            protected_urls = [
                'https://httpbin.org/cookies',
                'https://httpbin.org/headers'
            ]
            results = await scraper.fetch_protected_urls(protected_urls)

            for result in results:
                print(f"Fetched {result['url']} - Status: {result['status_code']}")

asyncio.run(scrape_with_auth())

Performance Comparison

Here's a benchmark comparing different approaches:

import time
import requests
import asyncio
import aiohttp
from concurrent.futures import ThreadPoolExecutor

def sync_requests(urls):
    """Synchronous requests baseline"""
    results = []
    for url in urls:
        try:
            response = requests.get(url, timeout=10)
            results.append(response.status_code)
        except:
            results.append(None)
    return results

def thread_pool_requests(urls, max_workers=5):
    """ThreadPoolExecutor approach"""
    def fetch(url):
        try:
            response = requests.get(url, timeout=10)
            return response.status_code
        except:
            return None

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        return list(executor.map(fetch, urls))

async def aiohttp_requests(urls):
    """Pure async with aiohttp"""
    async def fetch(session, url):
        try:
            async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as response:
                return response.status
        except:
            return None

    async with aiohttp.ClientSession() as session:
        tasks = [fetch(session, url) for url in urls]
        return await asyncio.gather(*tasks)

# Benchmark
test_urls = ['https://httpbin.org/delay/1'] * 10

print("Benchmarking different approaches...")

# Synchronous
start = time.time()
sync_results = sync_requests(test_urls)
print(f"Synchronous: {time.time() - start:.2f}s")

# ThreadPool
start = time.time()
thread_results = thread_pool_requests(test_urls)
print(f"ThreadPool: {time.time() - start:.2f}s")

# Async
start = time.time()
async_results = asyncio.run(aiohttp_requests(test_urls))
print(f"Async (aiohttp): {time.time() - start:.2f}s")

Best Practices and Considerations

1. Choose the Right Approach

  • ThreadPoolExecutor: Good for existing requests-based code, I/O-bound tasks
  • aiohttp: Best performance for new projects, true async/await support
  • requests-futures: Simple migration from synchronous requests

2. Error Handling

Always implement proper error handling for network failures:

async def robust_fetch(session, url, max_retries=3):
    """Fetch with retry logic"""
    for attempt in range(max_retries):
        try:
            async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as response:
                if response.status < 500:  # Don't retry client errors
                    return response
                else:
                    raise aiohttp.ClientResponseError(
                        request_info=response.request_info,
                        history=response.history,
                        status=response.status
                    )
        except (aiohttp.ClientError, asyncio.TimeoutError) as e:
            if attempt == max_retries - 1:
                raise e
            await asyncio.sleep(2 ** attempt)  # Exponential backoff

3. Resource Management

  • Close sessions properly to avoid resource leaks
  • Use context managers (async with) when possible
  • Set appropriate timeouts for all requests
  • Limit concurrent connections to avoid overwhelming servers

4. Memory Considerations

For large-scale scraping, consider streaming responses:

async def stream_large_file(session, url):
    """Stream large files without loading into memory"""
    async with session.get(url) as response:
        data = b''
        async for chunk in response.content.iter_chunked(8192):
            data += chunk
            # Process chunk or write to file
        return len(data)

Integration with Web Scraping Workflows

When building comprehensive web scraping solutions, asynchronous requests work well with other tools. For JavaScript-heavy sites that require browser automation, you might need to combine async HTTP requests for initial data gathering with browser automation tools for dynamic content handling.

For complex scraping scenarios involving multiple pages and dynamic content, consider implementing parallel processing strategies alongside your async request patterns.

Conclusion

Asynchronous requests dramatically improve the performance of web scraping and API interaction tasks. Choose ThreadPoolExecutor for simple parallelization of existing requests code, or aiohttp for new projects requiring maximum performance. Always implement proper error handling, rate limiting, and resource management to build robust, scalable scraping solutions.

The key to successful async scraping is balancing speed with respect for target servers - implement appropriate delays and concurrency limits to avoid being blocked while maximizing your scraping efficiency.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon