MechanicalSoup is a Python library that provides a simple API for automating interaction with websites. It builds on top of libraries like requests
for HTTP and BeautifulSoup
for parsing HTML. MechanicalSoup can indeed handle multi-part form data for file uploads.
When you need to upload files through a form, you typically encounter a multi-part form that uses the multipart/form-data
encoding type. MechanicalSoup can submit such forms by constructing the appropriate requests
data structures.
Here is an example of how you can use MechanicalSoup to upload a file:
import mechanicalsoup
# Create a browser object
browser = mechanicalsoup.Browser()
# Navigate to the page with the form you want to submit
page = browser.get("http://example.com/upload")
# Select the form
form = page.soup.select("form")[0]
# Fill out other form fields if needed
# form.input({'name_of_other_field': 'value'})
# Prepare the file payload
# 'file' is the name of the form field that accepts the file upload
files = {"file": ("filename.txt", open("local_file.txt", "rb"), "text/plain")}
# Submit the form with the file attached
response = browser.submit(form, page.url, files=files)
# Check the response
print(response.text)
In this code snippet:
- We create a
Browser
object to interact with the web. - We navigate to the page containing the upload form.
- We select the form we want to submit.
- We prepare the file payload using a dictionary. The key should match the name attribute of the
<input type="file">
field in the form. The value is a tuple with the filename, a file-like object (opened in binary read mode), and the MIME type of the file. - We call
browser.submit()
, passing the form, the form's URL, and thefiles
dictionary we created. - We can then inspect the response to check if the upload was successful.
Remember to handle the file object appropriately to avoid resource leaks, for example by using a with
statement when opening the file.
Please note that while MechanicalSoup is suitable for simple web automation tasks, it might not handle JavaScript or complex interactions that you might find on modern websites. In such cases, you might need tools like Selenium or Playwright that can automate a real web browser.