How do I use Selenium to extract images from a website?

Web scraping with Selenium is a handy tool for extracting data from websites dynamically. You can use it to extract images from a website as well. Here's how you can do it in both Python and JavaScript.

Python

In Python, you can use Selenium along with BeautifulSoup to extract images from a website. Here is a simple example:

First, install the necessary libraries if you haven't already done so:

pip install selenium beautifulsoup4

Then, you can use the following code:

from selenium import webdriver
from bs4 import BeautifulSoup

# Create a new instance of the Firefox driver
driver = webdriver.Firefox()

# Go to the page that we want to scrape
driver.get("http://www.example.com")

# Parse the page with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')

# Find all img tags
img_tags = soup.find_all('img')

# Extract URLs from 'src' attribute
img_urls = [img['src'] for img in img_tags]

for url in img_urls:
    print(url)

# Close the browser
driver.quit()

This code will print out all img URLs found on the page. You can modify it according to your needs.

JavaScript

In JavaScript, you can use Selenium WebDriver for Node.js. Before using it, make sure to install it:

npm install selenium-webdriver

Then, you can use the following code:

const {Builder, By} = require('selenium-webdriver');

async function extractImages() {
    let driver = await new Builder().forBrowser('firefox').build();
    try {
        // Navigate to the page
        await driver.get('http://www.example.com');

        // Find all img elements
        let elements = await driver.findElements(By.css('img'));

        // Extract 'src' attributes
        for (let element of elements) {
            let src = await element.getAttribute('src');
            console.log(src);
        }
    }
    finally {
        await driver.quit();
    }
}

extractImages();

This JavaScript code will do the same thing as the Python code, printing out all img URLs found on the page.

Remember that web scraping should be done respecting the terms of service of the website you are scraping. Also, it's a good practice to download images asynchronously and with a delay to not overload the server.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon