Goutte is a screen scraping and web crawling library for PHP. To save scraped data into a database using Goutte, you would need to:
- Use Goutte to scrape the data from the web page.
- Process the data as needed (clean up, format, etc.).
- Use a PHP database library or extension (like PDO, mysqli, or a framework's ORM) to insert the data into the database.
Here is a step-by-step example using Goutte to scrape data and PDO to insert it into a MySQL database:
Step 1: Install Goutte
If you haven't already installed Goutte, you can do so using Composer:
composer require fabpot/goutte
Step 2: Set Up Database Connection
Use PDO to connect to your MySQL database:
<?php
$host = '127.0.0.1';
$db = 'your_database';
$user = 'your_username';
$pass = 'your_password';
$charset = 'utf8mb4';
$dsn = "mysql:host=$host;dbname=$db;charset=$charset";
$options = [
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC,
PDO::ATTR_EMULATE_PREPARES => false,
];
try {
$pdo = new PDO($dsn, $user, $pass, $options);
} catch (\PDOException $e) {
throw new \PDOException($e->getMessage(), (int) $e->getCode());
}
Step 3: Scrape Data Using Goutte
Here's an example of how you might use Goutte to scrape data from a web page:
require 'vendor/autoload.php';
use Goutte\Client;
$client = new Client();
$crawler = $client->request('GET', 'http://example.com');
// Scrape data, for example, all 'h2' elements
$titles = $crawler->filter('h2')->each(function ($node) {
return $node->text();
});
// Process your data if necessary
// ...
// Now $titles is an array of the text content of each 'h2' on the page
Step 4: Insert Data into the Database
Now that you have your scraped data, you can insert it into the database using PDO:
foreach ($titles as $title) {
$stmt = $pdo->prepare("INSERT INTO your_table (column_name) VALUES (:title)");
$stmt->execute(['title' => $title]);
}
Make sure to replace your_table
and column_name
with the actual table and column names in your database where you want to store the scraped data.
Here is the complete code with database and scraping combined:
require 'vendor/autoload.php';
use Goutte\Client;
// Database connection setup
// ...
// Initialize Goutte client
$client = new Client();
// Request the page you want to scrape
$crawler = $client->request('GET', 'http://example.com');
// Scrape data
$titles = $crawler->filter('h2')->each(function ($node) {
return $node->text();
});
// Insert data into the database
foreach ($titles as $title) {
$stmt = $pdo->prepare("INSERT INTO your_table (column_name) VALUES (:title)");
$stmt->execute(['title' => $title]);
}
Important Notes:
- Always respect the
robots.txt
file of the website and the website's Terms of Service when scraping. - Be aware of the legal implications of web scraping and ensure that you are in compliance with local laws and regulations.
- Handle your database connections responsibly by using prepared statements to prevent SQL injection attacks.
- Use appropriate error handling for both the scraping and database insertion processes to ensure the stability and robustness of your application.