How do I save the scraped data into a database using Goutte?

Goutte is a screen scraping and web crawling library for PHP. To save scraped data into a database using Goutte, you would need to:

  1. Use Goutte to scrape the data from the web page.
  2. Process the data as needed (clean up, format, etc.).
  3. Use a PHP database library or extension (like PDO, mysqli, or a framework's ORM) to insert the data into the database.

Here is a step-by-step example using Goutte to scrape data and PDO to insert it into a MySQL database:

Step 1: Install Goutte

If you haven't already installed Goutte, you can do so using Composer:

composer require fabpot/goutte

Step 2: Set Up Database Connection

Use PDO to connect to your MySQL database:

<?php

$host = '127.0.0.1';
$db   = 'your_database';
$user = 'your_username';
$pass = 'your_password';
$charset = 'utf8mb4';

$dsn = "mysql:host=$host;dbname=$db;charset=$charset";
$options = [
    PDO::ATTR_ERRMODE            => PDO::ERRMODE_EXCEPTION,
    PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC,
    PDO::ATTR_EMULATE_PREPARES   => false,
];

try {
    $pdo = new PDO($dsn, $user, $pass, $options);
} catch (\PDOException $e) {
    throw new \PDOException($e->getMessage(), (int) $e->getCode());
}

Step 3: Scrape Data Using Goutte

Here's an example of how you might use Goutte to scrape data from a web page:

require 'vendor/autoload.php';

use Goutte\Client;

$client = new Client();
$crawler = $client->request('GET', 'http://example.com');

// Scrape data, for example, all 'h2' elements
$titles = $crawler->filter('h2')->each(function ($node) {
    return $node->text();
});

// Process your data if necessary
// ...

// Now $titles is an array of the text content of each 'h2' on the page

Step 4: Insert Data into the Database

Now that you have your scraped data, you can insert it into the database using PDO:

foreach ($titles as $title) {
    $stmt = $pdo->prepare("INSERT INTO your_table (column_name) VALUES (:title)");
    $stmt->execute(['title' => $title]);
}

Make sure to replace your_table and column_name with the actual table and column names in your database where you want to store the scraped data.

Here is the complete code with database and scraping combined:

require 'vendor/autoload.php';

use Goutte\Client;

// Database connection setup
// ...

// Initialize Goutte client
$client = new Client();

// Request the page you want to scrape
$crawler = $client->request('GET', 'http://example.com');

// Scrape data
$titles = $crawler->filter('h2')->each(function ($node) {
    return $node->text();
});

// Insert data into the database
foreach ($titles as $title) {
    $stmt = $pdo->prepare("INSERT INTO your_table (column_name) VALUES (:title)");
    $stmt->execute(['title' => $title]);
}

Important Notes:

  • Always respect the robots.txt file of the website and the website's Terms of Service when scraping.
  • Be aware of the legal implications of web scraping and ensure that you are in compliance with local laws and regulations.
  • Handle your database connections responsibly by using prepared statements to prevent SQL injection attacks.
  • Use appropriate error handling for both the scraping and database insertion processes to ensure the stability and robustness of your application.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon