MechanicalSoup is a Python library for automating interaction with websites. It provides a simple API for navigating and manipulating web pages, making it useful for web scraping tasks. While MechanicalSoup does not provide a built-in method to save scraped data directly to a file, you can easily do this using Python's file handling capabilities.
Here's a step-by-step guide on how to scrape data using MechanicalSoup and then save that data to a file:
- Install MechanicalSoup if you haven't already:
pip install MechanicalSoup
- Import MechanicalSoup and other necessary libraries:
import mechanicalsoup
- Create a browser object with
mechanicalsoup.StatefulBrowser
:
browser = mechanicalsoup.StatefulBrowser()
- Navigate to the page you want to scrape:
browser.open("http://example.com")
Interact with the page as needed (e.g., select forms, submit, etc.) and scrape the desired data.
Save the scraped data to a file. Here's an example where we scrape the contents of a webpage and save it as a text file:
# Open the page
browser.open("http://example.com")
# Get the page's HTML content
page_html = browser.page.prettify()
# Specify the file name
file_name = "scraped_data.txt"
# Open a file with write permission and save the content
with open(file_name, "w", encoding="utf-8") as file:
file.write(page_html)
# Don't forget to close the browser session
browser.close()
If you want to save the data in a structured format like CSV or JSON, you would first need to parse the scraped data accordingly. For example:
import csv
# Assume you have a list of dictionaries with the scraped data
scraped_data = [
{"name": "Alice", "age": "30"},
{"name": "Bob", "age": "25"}
]
# Specify the CSV file name
csv_file_name = "scraped_data.csv"
# Save the data to a CSV file
with open(csv_file_name, "w", newline='', encoding="utf-8") as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=scraped_data[0].keys())
writer.writeheader()
for data in scraped_data:
writer.writerow(data)
To save as JSON:
import json
# Assume scraped_data is the data you want to save
scraped_data = {
"title": "Example Domain",
"url": "http://example.com"
}
# Specify the JSON file name
json_file_name = "scraped_data.json"
# Save the data to a JSON file
with open(json_file_name, "w", encoding="utf-8") as jsonfile:
json.dump(scraped_data, jsonfile, indent=4)
Keep in mind that when scraping websites, it's important to respect the site's robots.txt
file and its terms of service. Always ensure that your scraping activities are legal and ethical.