MechanicalSoup is a Python library for automating interaction with websites. It provides a simple API for navigating pages, submitting forms, and scraping web content. MechanicalSoup is built on top of the requests
library and BeautifulSoup
.
When dealing with redirects in MechanicalSoup, it's actually the underlying requests
session that handles them by default. When you make a request to a URL that responds with a redirect status code (such as 301 or 302), requests
will automatically follow that redirect unless told otherwise.
Here's how you can use MechanicalSoup with redirects:
import mechanicalsoup
# Create a browser object
browser = mechanicalsoup.Browser()
# By default, MechanicalSoup follows redirects. Here's an example:
response = browser.get("http://github.com") # GitHub redirects to https://github.com
# The final response URL after redirects
print(response.url) # Should print the URL after redirects: "https://github.com"
# If you want to disable following redirects, you can do so by accessing the underlying session:
browser.session.redirect = False
response = browser.get("http://github.com")
# The above will not follow redirects and will give you the initial 301 response.
Please note that the redirect
attribute is not directly available in MechanicalSoup's Browser or StatefulBrowser objects. To control redirects, you should interact with the session
object's allow_redirects
parameter like this:
# To disable following redirects:
browser.session.allow_redirects = False
# Now when you make a request, it won't follow redirects
response = browser.get("http://github.com")
# Check the status code to see the redirect status code (e.g., 301, 302)
print(response.status_code)
To re-enable following redirects, just set allow_redirects
back to True
:
browser.session.allow_redirects = True
When redirects are disabled, you can manually handle them by inspecting the response's headers. For example, you might want to extract the Location
header to get the URL to which the request is redirected:
if 300 <= response.status_code < 400:
redirect_url = response.headers['Location']
print(f"Redirect to: {redirect_url}")
# You can then use the browser to go to the redirect URL
response = browser.get(redirect_url)
Always remember to check the status code before attempting to access the Location
header, as not all responses contain it (only responses with redirect status codes typically do).
Using MechanicalSoup in this way allows you to handle redirects manually if necessary, but in most cases, you can rely on the automatic redirect handling provided by the requests
library.