Web scraping with Selenium is a handy tool for extracting data from websites dynamically. You can use it to extract images from a website as well. Here's how you can do it in both Python and JavaScript.
Python
In Python, you can use Selenium along with BeautifulSoup to extract images from a website. Here is a simple example:
First, install the necessary libraries if you haven't already done so:
pip install selenium beautifulsoup4
Then, you can use the following code:
from selenium import webdriver
from bs4 import BeautifulSoup
# Create a new instance of the Firefox driver
driver = webdriver.Firefox()
# Go to the page that we want to scrape
driver.get("http://www.example.com")
# Parse the page with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
# Find all img tags
img_tags = soup.find_all('img')
# Extract URLs from 'src' attribute
img_urls = [img['src'] for img in img_tags]
for url in img_urls:
print(url)
# Close the browser
driver.quit()
This code will print out all img URLs found on the page. You can modify it according to your needs.
JavaScript
In JavaScript, you can use Selenium WebDriver for Node.js. Before using it, make sure to install it:
npm install selenium-webdriver
Then, you can use the following code:
const {Builder, By} = require('selenium-webdriver');
async function extractImages() {
let driver = await new Builder().forBrowser('firefox').build();
try {
// Navigate to the page
await driver.get('http://www.example.com');
// Find all img elements
let elements = await driver.findElements(By.css('img'));
// Extract 'src' attributes
for (let element of elements) {
let src = await element.getAttribute('src');
console.log(src);
}
}
finally {
await driver.quit();
}
}
extractImages();
This JavaScript code will do the same thing as the Python code, printing out all img URLs found on the page.
Remember that web scraping should be done respecting the terms of service of the website you are scraping. Also, it's a good practice to download images asynchronously and with a delay to not overload the server.