Table of contents

How do I debug issues with urllib3?

Debugging issues with urllib3 requires a systematic approach to identify and resolve common problems. This guide covers essential debugging techniques, tools, and best practices to help you troubleshoot urllib3 issues effectively.

Common urllib3 Issues

Before diving into debugging techniques, understand the most common urllib3 problems:

  • Connection errors: Network connectivity, DNS resolution, or firewall issues
  • SSL/TLS issues: Certificate verification failures or outdated certificates
  • Timeout problems: Slow responses or network latency
  • HTTP errors: 4xx/5xx status codes from servers
  • Encoding issues: Character encoding problems in responses
  • Pool management: Connection pool exhaustion or configuration issues

1. Enable Comprehensive Logging

Logging is your first line of defense for debugging urllib3 issues. Configure detailed logging to capture all HTTP activity:

import logging
import urllib3

# Configure comprehensive logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

# Enable urllib3 debug logging
urllib3_logger = logging.getLogger('urllib3')
urllib3_logger.setLevel(logging.DEBUG)

# Disable SSL warnings if needed (for testing only)
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

# Create pool manager and make request
http = urllib3.PoolManager()
response = http.request('GET', 'https://httpbin.org/get')

For more granular control, enable specific logger components:

# Enable specific urllib3 loggers
logging.getLogger('urllib3.connectionpool').setLevel(logging.DEBUG)
logging.getLogger('urllib3.util.retry').setLevel(logging.DEBUG)
logging.getLogger('urllib3.poolmanager').setLevel(logging.DEBUG)

2. Comprehensive Exception Handling

Handle all urllib3 exceptions systematically to understand what's going wrong:

import urllib3
from urllib3.exceptions import (
    HTTPError, MaxRetryError, TimeoutError, 
    SSLError, ConnectTimeoutError, ReadTimeoutError
)

def debug_request(url, method='GET', **kwargs):
    http = urllib3.PoolManager()

    try:
        response = http.request(method, url, **kwargs)
        print(f"✓ Success: {response.status} {response.reason}")
        return response

    except MaxRetryError as e:
        print(f"✗ Max retries exceeded: {e}")
        print(f"  Reason: {e.reason}")

    except ConnectTimeoutError as e:
        print(f"✗ Connection timeout: {e}")

    except ReadTimeoutError as e:
        print(f"✗ Read timeout: {e}")

    except SSLError as e:
        print(f"✗ SSL Error: {e}")

    except HTTPError as e:
        print(f"✗ HTTP Error: {e}")

    except Exception as e:
        print(f"✗ Unexpected error: {type(e).__name__}: {e}")

    return None

# Test the function
response = debug_request('https://httpbin.org/delay/5', timeout=3.0)

3. Detailed Response Inspection

Examine responses thoroughly to identify issues:

def inspect_response(response):
    if response is None:
        return

    print(f"Status: {response.status} {response.reason}")
    print(f"Version: HTTP/{response.version}")
    print(f"Headers ({len(response.headers)}):")

    for name, value in response.headers.items():
        print(f"  {name}: {value}")

    # Check content encoding
    content_encoding = response.headers.get('content-encoding', 'none')
    print(f"Content-Encoding: {content_encoding}")

    # Check content length
    content_length = response.headers.get('content-length', 'unknown')
    print(f"Content-Length: {content_length}")

    # Sample response data
    data = response.data
    print(f"Response size: {len(data)} bytes")

    if len(data) > 0:
        try:
            # Try to decode as text
            text = data.decode('utf-8')[:200]
            print(f"Response preview: {text}...")
        except UnicodeDecodeError:
            print("Response contains binary data")

# Example usage
http = urllib3.PoolManager()
response = http.request('GET', 'https://httpbin.org/gzip')
inspect_response(response)

4. Connection and Pool Debugging

Debug connection pool issues and configuration problems:

def debug_pool_manager():
    # Create pool manager with debug configuration
    http = urllib3.PoolManager(
        num_pools=10,
        maxsize=10,
        block=False,
        timeout=urllib3.Timeout(connect=5.0, read=10.0),
        retries=urllib3.Retry(
            total=3,
            backoff_factor=0.5,
            status_forcelist=[502, 503, 504]
        )
    )

    # Make multiple requests to test pooling
    urls = [
        'https://httpbin.org/get',
        'https://httpbin.org/headers',
        'https://httpbin.org/user-agent'
    ]

    for url in urls:
        try:
            response = http.request('GET', url)
            print(f"✓ {url}: {response.status}")
        except Exception as e:
            print(f"✗ {url}: {e}")

    # Check pool statistics
    print(f"Pool stats: {http.pools}")

debug_pool_manager()

5. SSL/TLS Debugging

Troubleshoot SSL certificate and TLS configuration issues:

import ssl
import certifi
import urllib3

def debug_ssl_connection(url):
    print(f"Debugging SSL connection to: {url}")

    # Check system SSL configuration
    print(f"OpenSSL version: {ssl.OPENSSL_VERSION}")
    print(f"Default CA bundle: {ssl.get_default_verify_paths()}")
    print(f"Certifi CA bundle: {certifi.where()}")

    # Test with different SSL configurations
    configs = [
        ("Default", {}),
        ("No verification", {"cert_reqs": "CERT_NONE"}),
        ("With certifi", {"ca_certs": certifi.where()}),
        ("Custom context", {"ssl_context": ssl.create_default_context()})
    ]

    for name, config in configs:
        try:
            http = urllib3.PoolManager(**config)
            response = http.request('GET', url, timeout=5.0)
            print(f"✓ {name}: {response.status}")
        except Exception as e:
            print(f"✗ {name}: {e}")

# Test SSL debugging
debug_ssl_connection('https://httpbin.org/get')

6. Network Connectivity Testing

Verify network connectivity and DNS resolution:

import socket
from urllib.parse import urlparse

def test_network_connectivity(url):
    parsed = urlparse(url)
    hostname = parsed.hostname
    port = parsed.port or (443 if parsed.scheme == 'https' else 80)

    print(f"Testing connectivity to {hostname}:{port}")

    # DNS resolution test
    try:
        ip_addresses = socket.getaddrinfo(hostname, port)
        print(f"✓ DNS resolution successful:")
        for addr in ip_addresses[:3]:  # Show first 3 addresses
            print(f"  {addr[4][0]}")
    except socket.gaierror as e:
        print(f"✗ DNS resolution failed: {e}")
        return False

    # TCP connection test
    try:
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.settimeout(5.0)
        result = sock.connect_ex((hostname, port))
        sock.close()

        if result == 0:
            print(f"✓ TCP connection successful")
            return True
        else:
            print(f"✗ TCP connection failed: {result}")
            return False
    except Exception as e:
        print(f"✗ TCP connection error: {e}")
        return False

# Test network connectivity
test_network_connectivity('https://httpbin.org')

7. Advanced Debugging with Proxy Inspection

Use debugging proxies to inspect HTTP traffic:

def debug_with_proxy(url, proxy_url='http://localhost:8080'):
    """Debug requests through a proxy like mitmproxy or Charles"""

    print(f"Routing traffic through proxy: {proxy_url}")

    try:
        # Configure proxy
        http = urllib3.ProxyManager(
            proxy_url,
            timeout=10.0,
            retries=False
        )

        # Make request through proxy
        response = http.request('GET', url)
        print(f"✓ Request successful: {response.status}")

        # Display key information
        print(f"Response headers: {dict(response.headers)}")

    except Exception as e:
        print(f"✗ Proxy request failed: {e}")
        print("Make sure your proxy is running and accessible")

# Example: Using mitmproxy (start with: mitmdump -p 8080)
# debug_with_proxy('https://httpbin.org/get', 'http://localhost:8080')

8. Performance and Timeout Debugging

Debug performance issues and timeout problems:

import time
import urllib3

def debug_performance(url, iterations=3):
    """Measure request performance and identify bottlenecks"""

    http = urllib3.PoolManager(
        timeout=urllib3.Timeout(connect=5.0, read=30.0)
    )

    times = []

    for i in range(iterations):
        start_time = time.time()

        try:
            response = http.request('GET', url)
            end_time = time.time()

            duration = end_time - start_time
            times.append(duration)

            print(f"Request {i+1}: {response.status} in {duration:.3f}s")

        except Exception as e:
            print(f"Request {i+1} failed: {e}")

    if times:
        avg_time = sum(times) / len(times)
        print(f"Average response time: {avg_time:.3f}s")
        print(f"Min: {min(times):.3f}s, Max: {max(times):.3f}s")

# Test performance
debug_performance('https://httpbin.org/delay/1', iterations=3)

9. Debugging Common Issues

URL Encoding Problems

from urllib.parse import quote, unquote

def debug_url_encoding(url):
    print(f"Original URL: {url}")
    print(f"Encoded URL: {quote(url, safe=':/?#[]@!$&\'()*+,;=')}")

    # Test the request
    http = urllib3.PoolManager()
    try:
        response = http.request('GET', url)
        print(f"✓ Request successful: {response.status}")
    except Exception as e:
        print(f"✗ Request failed: {e}")
        # Try with encoded URL
        encoded_url = quote(url, safe=':/?#[]@!$&\'()*+,;=')
        print(f"Trying encoded URL: {encoded_url}")

Headers and User-Agent Issues

def debug_headers(url):
    """Debug common header-related issues"""

    headers_tests = [
        ("No headers", {}),
        ("Basic headers", {
            'User-Agent': 'Mozilla/5.0 (urllib3-debug)',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
        }),
        ("Full browser headers", {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Accept-Encoding': 'gzip, deflate',
            'Connection': 'keep-alive',
            'Upgrade-Insecure-Requests': '1'
        })
    ]

    http = urllib3.PoolManager()

    for name, headers in headers_tests:
        try:
            response = http.request('GET', url, headers=headers)
            print(f"✓ {name}: {response.status}")
        except Exception as e:
            print(f"✗ {name}: {e}")

# Test headers
debug_headers('https://httpbin.org/headers')

10. Complete Debugging Workflow

Here's a complete debugging function that combines all techniques:

def complete_debug(url, method='GET', **kwargs):
    """Comprehensive urllib3 debugging function"""

    print(f"=== Debugging {method} {url} ===")

    # 1. Test network connectivity
    print("\n1. Network Connectivity:")
    if not test_network_connectivity(url):
        return None

    # 2. Configure detailed logging
    print("\n2. Enabling detailed logging...")
    logging.basicConfig(level=logging.DEBUG)
    urllib3.disable_warnings()

    # 3. Create pool manager with debug config
    http = urllib3.PoolManager(
        timeout=urllib3.Timeout(connect=5.0, read=30.0),
        retries=urllib3.Retry(total=3, backoff_factor=0.5)
    )

    # 4. Make request with comprehensive error handling
    print("\n3. Making request...")
    response = debug_request(url, method, **kwargs)

    # 5. Inspect response if successful
    if response:
        print("\n4. Response inspection:")
        inspect_response(response)

    return response

# Example usage
response = complete_debug('https://httpbin.org/get')

Best Practices for urllib3 Debugging

  1. Always enable logging during development and testing
  2. Use specific exception handling rather than broad try-except blocks
  3. Test network connectivity before assuming code issues
  4. Verify SSL certificates and update certificate bundles regularly
  5. Configure appropriate timeouts for your use case
  6. Monitor connection pool usage for high-traffic applications
  7. Use debugging proxies for complex request/response analysis
  8. Keep urllib3 updated to benefit from bug fixes and improvements

By following these debugging techniques systematically, you can identify and resolve most urllib3 issues efficiently. Remember that debugging is often an iterative process—start with basic checks and gradually apply more advanced techniques as needed.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon