When scraping web pages that require authentication, you may need to provide credentials to access the content. The Python requests
library allows you to handle various types of authentication with ease. Below are some methods to add authentication credentials to a request using the requests
library:
Basic Authentication
For basic HTTP authentication, you can use the auth
parameter of the requests
method to provide a username and password:
import requests
from requests.auth import HTTPBasicAuth
url = 'https://example.com/api'
username = 'your_username'
password = 'your_password'
response = requests.get(url, auth=HTTPBasicAuth(username, password))
# Or you can simply pass the credentials as a tuple
response = requests.get(url, auth=(username, password))
print(response.text)
Digest Authentication
If the server uses digest authentication, use the HTTPDigestAuth
class:
import requests
from requests.auth import HTTPDigestAuth
url = 'https://example.com/api'
username = 'your_username'
password = 'your_password'
response = requests.get(url, auth=HTTPDigestAuth(username, password))
print(response.text)
OAuth
For services that use OAuth for authentication, you will need to obtain an access token and include it in the headers of your request. Here's a basic example:
import requests
url = 'https://example.com/api'
access_token = 'your_access_token'
headers = {
'Authorization': f'Bearer {access_token}'
}
response = requests.get(url, headers=headers)
print(response.text)
Custom Authentication
If you have a custom authentication scheme, you can define your own authentication class by inheriting from requests.auth.AuthBase
:
import requests
from requests.auth import AuthBase
class CustomAuth(AuthBase):
def __init__(self, token):
self.token = token
def __call__(self, r):
# Modify and return the request
r.headers['Authorization'] = f'Token {self.token}'
return r
url = 'https://example.com/api'
token = 'your_custom_token'
response = requests.get(url, auth=CustomAuth(token))
print(response.text)
Session Objects
If you need to persist certain parameters across requests, use a session object. This is especially useful for cookies, as they will be managed automatically between requests made using the session:
import requests
from requests.auth import HTTPBasicAuth
url = 'https://example.com/api'
username = 'your_username'
password = 'your_password'
# Create a session object
session = requests.Session()
# Set up authentication
session.auth = (username, password)
# Make a request
response = session.get(url)
print(response.text)
# The session object will persist the authentication for future requests
another_response = session.get('https://example.com/another_api')
print(another_response.text)
Remember to handle your credentials securely and never hardcode them directly into your scripts. Consider using environment variables or secure credential storage solutions. Additionally, make sure that you are complying with the terms of service of the website you are scraping and that you are not violating any laws or regulations.