How do I Store Web Scraping Results in a Database Using n8n?
Storing web scraping results in a database is essential for building scalable data collection pipelines. n8n provides native database nodes that make it easy to save scraped data to popular databases like PostgreSQL, MySQL, MongoDB, and more. This guide will walk you through the complete process of storing web scraping results in various databases using n8n workflows.
Overview of Database Storage in n8n
n8n supports multiple database integrations out of the box, allowing you to:
- Store scraped data in relational databases (PostgreSQL, MySQL, MariaDB)
- Save unstructured data in NoSQL databases (MongoDB, Redis)
- Use cloud database services (Supabase, Firebase, Airtable)
- Execute custom SQL queries for complex data operations
- Batch insert multiple records for better performance
Setting Up Database Connections in n8n
Before you can store data, you need to configure database credentials in n8n:
- Navigate to Settings → Credentials
- Click Add Credential and select your database type
- Enter connection details (host, port, database name, username, password)
- Test the connection to ensure it works
- Save the credential for use in your workflows
Storing Data in PostgreSQL
PostgreSQL is one of the most popular choices for storing structured web scraping data. Here's how to set up a complete workflow:
Step 1: Create Your Database Table
First, create a table to store your scraped data:
CREATE TABLE scraped_products (
id SERIAL PRIMARY KEY,
product_name VARCHAR(255),
price DECIMAL(10, 2),
description TEXT,
url VARCHAR(500),
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Step 2: Build the n8n Workflow
- HTTP Request Node or HTML Extract Node: Scrape your target website
- Code Node: Transform and clean the scraped data
- Postgres Node: Insert data into your database
Here's an example Code Node that prepares data for PostgreSQL:
// Transform scraped data for database insertion
const items = $input.all();
const results = [];
for (const item of items) {
const data = item.json;
results.push({
product_name: data.title || 'N/A',
price: parseFloat(data.price?.replace(/[^0-9.]/g, '')) || 0,
description: data.description?.substring(0, 1000) || '',
url: data.url || '',
scraped_at: new Date().toISOString()
});
}
return results.map(result => ({ json: result }));
Step 3: Configure the Postgres Node
In the Postgres node: - Operation: Insert - Table: scraped_products - Columns: Map your JSON fields to database columns - Return Fields: Select which fields to return after insertion
For batch inserts, enable Insert Multiple Rows and pass an array of objects.
Storing Data in MySQL
MySQL setup is similar to PostgreSQL. Here's a complete example:
Create MySQL Table
CREATE TABLE scraped_articles (
id INT AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(255) NOT NULL,
author VARCHAR(100),
content TEXT,
published_date DATE,
source_url VARCHAR(500),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_published_date (published_date),
INDEX idx_source_url (source_url(255))
);
MySQL Workflow Configuration
// Code Node: Prepare data for MySQL
const scrapedData = $input.all();
return scrapedData.map(item => {
const data = item.json;
return {
json: {
title: data.title?.trim() || '',
author: data.author?.trim() || 'Unknown',
content: data.content || '',
published_date: data.date ? new Date(data.date).toISOString().split('T')[0] : null,
source_url: data.url || ''
}
};
});
In the MySQL node, use the Execute Query operation for more control:
INSERT INTO scraped_articles (title, author, content, published_date, source_url)
VALUES (?, ?, ?, ?, ?)
ON DUPLICATE KEY UPDATE
content = VALUES(content),
updated_at = CURRENT_TIMESTAMP;
Storing Data in MongoDB
MongoDB is excellent for storing unstructured or semi-structured scraped data:
MongoDB Workflow Example
// Code Node: Prepare data for MongoDB
const items = $input.all();
return items.map(item => {
const data = item.json;
return {
json: {
title: data.title,
metadata: {
price: data.price,
availability: data.availability,
rating: data.rating
},
images: data.images || [],
tags: data.tags || [],
scrapedAt: new Date(),
source: {
url: data.url,
domain: new URL(data.url).hostname
}
}
};
});
In the MongoDB node: - Operation: Insert Many - Collection: your_collection_name - Options: Enable ordered inserts for better error handling
Advanced Database Operations
Handling Duplicate Data
Use upsert operations to avoid duplicate entries:
PostgreSQL Upsert Example:
INSERT INTO scraped_products (url, product_name, price, description)
VALUES ($1, $2, $3, $4)
ON CONFLICT (url)
DO UPDATE SET
product_name = EXCLUDED.product_name,
price = EXCLUDED.price,
description = EXCLUDED.description,
updated_at = CURRENT_TIMESTAMP;
Batch Processing for Performance
When scraping large datasets, use batch inserts:
// Code Node: Batch data into groups of 100
const items = $input.all();
const batchSize = 100;
const batches = [];
for (let i = 0; i < items.length; i += batchSize) {
batches.push({
json: {
records: items.slice(i, i + batchSize).map(item => item.json)
}
});
}
return batches;
Error Handling and Logging
Implement proper error handling to avoid data loss:
// Code Node: Error handling wrapper
try {
const items = $input.all();
const validItems = [];
const errors = [];
for (const item of items) {
try {
// Validate required fields
if (!item.json.url || !item.json.title) {
throw new Error('Missing required fields');
}
validItems.push(item);
} catch (error) {
errors.push({
data: item.json,
error: error.message,
timestamp: new Date().toISOString()
});
}
}
// Log errors to a separate table or file
if (errors.length > 0) {
console.error('Validation errors:', errors);
}
return validItems;
} catch (error) {
throw new Error(`Data processing failed: ${error.message}`);
}
Using Cloud Database Services
Storing Data in Supabase
Supabase provides a PostgreSQL database with a REST API:
- Add the Supabase node to your workflow
- Select Insert operation
- Choose your table
- Map scraped fields to table columns
Storing Data in Airtable
Airtable is great for non-technical users who need to view scraped data:
// Code Node: Format data for Airtable
const items = $input.all();
return items.map(item => {
return {
json: {
fields: {
'Product Name': item.json.title,
'Price': item.json.price,
'URL': item.json.url,
'Status': 'New',
'Scraped Date': new Date().toISOString()
}
}
};
});
Complete Workflow Example
Here's a full n8n workflow that scrapes product data and stores it in PostgreSQL:
// Node 1: Schedule Trigger (runs daily at 9 AM)
// Node 2: HTTP Request
// URL: https://example.com/products
// Method: GET
// Node 3: HTML Extract
// Extraction Rules:
// - title: .product-title
// - price: .product-price
// - description: .product-description
// - image: .product-image@src
// Node 4: Code Node (Data Transformation)
const items = $input.all();
return items.map(item => {
const data = item.json;
return {
json: {
product_name: data.title?.trim() || '',
price: parseFloat(data.price?.replace(/[^0-9.]/g, '')) || 0,
description: data.description?.trim() || '',
image_url: data.image || '',
source_url: 'https://example.com/products',
scraped_at: new Date().toISOString()
}
};
});
// Node 5: Postgres Node
// Operation: Insert Multiple
// Table: scraped_products
// Data Mode: Auto-map Input Data
Best Practices for Database Storage
- Create Indexes: Index frequently queried columns (URLs, dates) for faster lookups
- Use Timestamps: Always store when data was scraped for tracking freshness
- Validate Data: Check for required fields before insertion to avoid database errors
- Handle Duplicates: Implement upsert logic to update existing records
- Batch Operations: Insert multiple records at once for better performance
- Error Logging: Store failed inserts in a separate error table
- Data Normalization: Keep your database schema normalized for relational data
- Backup Regularly: Schedule automated backups of your scraped data
- Monitor Performance: Track insertion times and optimize slow queries
- Clean Old Data: Implement data retention policies to manage database size
Integrating with Web Scraping APIs
For more reliable scraping, consider using specialized web scraping APIs that handle anti-bot measures and provide structured data. When handling authentication in web scraping workflows, you'll need to properly manage sessions before storing data.
You can also enhance your n8n workflows by monitoring network requests during scraping to ensure you're capturing all necessary data before database insertion.
Troubleshooting Common Issues
Connection Timeouts
If database inserts are timing out: - Reduce batch size - Increase connection timeout in credentials - Check database server load
Duplicate Key Errors
Implement proper upsert logic or use unique constraints:
ALTER TABLE scraped_products
ADD CONSTRAINT unique_url UNIQUE (url);
Data Type Mismatches
Ensure your data transformation matches database column types:
// Convert strings to appropriate types
price: parseFloat(data.price) || 0,
quantity: parseInt(data.quantity) || 0,
is_available: data.availability === 'In Stock',
scraped_at: new Date().toISOString()
Conclusion
Storing web scraping results in a database using n8n is straightforward with the right workflow design. Choose the database that best fits your needs—PostgreSQL or MySQL for structured data, MongoDB for flexible schemas, or cloud services like Supabase for managed hosting. Always implement error handling, data validation, and batch processing to build robust and scalable scraping pipelines.
By following these best practices and examples, you can create reliable data collection systems that automatically scrape websites and store results in your database of choice, making the data easily accessible for analysis, reporting, and application integration.