What are the main features of urllib3?

urllib3 is a powerful, user-friendly HTTP client for Python. It is significantly more advanced than urllib2, which comes with the Python standard library (though urllib2 was merged into urllib in Python 3). urllib3 provides many features that are not found in the standard library's HTTP client. Here are some of the main features of urllib3:

Connection Pooling

urllib3 uses connection pooling to reuse connections to a host, which improves the efficiency of network operations by reducing the number of connections that need to be opened and subsequently closed.

import urllib3

http = urllib3.PoolManager()
r = http.request('GET', 'http://httpbin.org/robots.txt')

Thread Safety

Connection pools in urllib3 are thread-safe, allowing you to use the same PoolManager or ConnectionPool across threads without any additional locking.

Blocking and Non-Blocking I/O

urllib3 supports both blocking and non-blocking I/O. It can work with synchronous code as well as integrate with event loops for asynchronous applications.

SSL/TLS Verification

urllib3 can verify SSL certificates for HTTPS requests, ensuring that the connection is secure. It also allows you to specify your own CA certificates.

http = urllib3.PoolManager(
    cert_reqs='CERT_REQUIRED',
    ca_certs='/path/to/your/certificate_bundle'
)
r = http.request('GET', 'https://example.com/')

Client-Side SSL/TLS Support

urllib3 can also handle client-side SSL/TLS by allowing you to specify your own certificates.

http = urllib3.PoolManager(
    key_file='/path/to/key.pem',
    cert_file='/path/to/cert.pem'
)

Automatic Content Decoding

urllib3 can automatically decode gzip and deflate transfer-encodings when the server sends it.

Retry Logic

urllib3 can automatically retry idempotent requests for intermittent failures, which is configurable via the Retry class.

retries = urllib3.Retry(connect=5, read=2, redirect=5)
http = urllib3.PoolManager(retries=retries)

Redirect Handling

urllib3 can automatically follow redirects, or it can be configured to handle redirects manually.

Support for Chunked Requests

urllib3 supports chunked transfer encoding for both requests and responses, allowing for streaming uploads and downloads.

HTTP and SOCKS Proxy Support

urllib3 can work with HTTP and SOCKS proxies, using the proxy_from_url function to create a connection through a proxy.

http = urllib3.ProxyManager('http://localhost:8080/')

Headers, Query Parameters, and Form Fields

urllib3 allows you to easily add HTTP headers, query parameters, and send form data with your requests.

r = http.request(
    'GET',
    'http://httpbin.org/get',
    fields={'hello': 'world'},
    headers={'X-Something': 'value'}
)

Streaming and Large File Uploads

urllib3 supports streaming uploads, which is great for large files because they don’t need to be loaded into memory.

JSON Content

While urllib3 does not directly support JSON content, it is easy to send JSON requests with the standard json module.

import json

http = urllib3.PoolManager()
encoded_data = json.dumps({"attribute": "value"}).encode('utf-8')
r = http.request(
    'POST',
    'http://httpbin.org/post',
    body=encoded_data,
    headers={'Content-Type': 'application/json'}
)

Extensibility

urllib3 is designed with extensibility in mind, allowing you to implement custom connection types, request/response handling, etc.

These features make urllib3 a very comprehensive and flexible solution for HTTP networking in Python. However, it's important to note that urllib3 is a lower-level library that requires you to manage request encoding, response decoding, and error handling on your own. For a higher-level HTTP client that abstracts these tasks, you might consider using requests, which is built on top of urllib3.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon