How do I filter Trustpilot reviews by star rating during scraping?

To filter Trustpilot reviews by star rating during scraping, you'll need to follow these general steps:

  1. Access Trustpilot Review Page: Start by navigating to the page containing the reviews you want to scrape.

  2. Inspect the Page Structure: Understand the HTML structure and how the reviews are organized, particularly how star ratings are represented in the HTML.

  3. Scrape the Data: Write a script that scrapes the review content along with the star ratings.

  4. Filter by Star Rating: After scraping, filter the reviews based on the star rating criteria you're interested in.

Please note that web scraping Trustpilot can violate their terms of service. Ensure you are in compliance with Trustpilot's terms and conditions and any relevant laws before proceeding.

Here's a hypothetical example using Python with requests to fetch the content and BeautifulSoup to parse the HTML. The actual structure of Trustpilot's website may differ, and you might need to adjust the code accordingly.

import requests
from bs4 import BeautifulSoup

# Define the URL of the Trustpilot review page
url = 'https://www.trustpilot.com/review/example.com'

# Make a request to get the page content
response = requests.get(url)
html = response.content

# Parse the HTML content
soup = BeautifulSoup(html, 'html.parser')

# Find all review elements - you'd need to inspect the page to know the correct class or structure
reviews = soup.find_all('div', class_='review')

# Filter reviews by star rating
desired_rating = 5  # For example, to filter 5-star reviews
filtered_reviews = []

for review in reviews:
    # Find the star rating element within each review
    # The class name 'star-rating' is hypothetical; you'll need to check the actual class name
    rating = review.find('div', class_='star-rating').get('data-rating')

    # Compare and filter the reviews based on the star rating
    if int(rating) == desired_rating:
        filtered_reviews.append(review.text.strip())

# Do something with the filtered reviews (e.g., print, save to a file, etc.)
for review in filtered_reviews:
    print(review)

Remember to respect the robots.txt file of Trustpilot and avoid making too many requests in a short period to prevent getting IP banned.

If you need to scrape a lot of data or handle pagination (since Trustpilot will have many pages of reviews), you'll need to write additional code to navigate through pages and handle rate limiting.

For JavaScript and browser-based scraping, you can use tools like puppeteer or cheerio (for server-side scraping with Node.js). Here is a basic example using puppeteer:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://www.trustpilot.com/review/example.com');

    // Evaluate the page and filter reviews within the browser context
    const filteredReviews = await page.evaluate((desiredRating) => {
        const reviewElements = Array.from(document.querySelectorAll('.review'));
        return reviewElements
            .map(review => {
                const rating = Number(review.querySelector('.star-rating').getAttribute('data-rating'));
                if (rating === desiredRating) {
                    return review.innerText.trim();
                }
            })
            .filter(Boolean); // Remove undefined items from the array
    }, 5); // Desired star rating

    // Output the filtered reviews
    console.log(filteredReviews);

    await browser.close();
})();

Be sure to install puppeteer via npm (npm install puppeteer) before running the JavaScript code example.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon