Table of contents

How do I Store Web Scraping Results in a Database Using n8n?

Storing web scraping results in a database is essential for building scalable data collection pipelines. n8n provides native database nodes that make it easy to save scraped data to popular databases like PostgreSQL, MySQL, MongoDB, and more. This guide will walk you through the complete process of storing web scraping results in various databases using n8n workflows.

Overview of Database Storage in n8n

n8n supports multiple database integrations out of the box, allowing you to:

  • Store scraped data in relational databases (PostgreSQL, MySQL, MariaDB)
  • Save unstructured data in NoSQL databases (MongoDB, Redis)
  • Use cloud database services (Supabase, Firebase, Airtable)
  • Execute custom SQL queries for complex data operations
  • Batch insert multiple records for better performance

Setting Up Database Connections in n8n

Before you can store data, you need to configure database credentials in n8n:

  1. Navigate to SettingsCredentials
  2. Click Add Credential and select your database type
  3. Enter connection details (host, port, database name, username, password)
  4. Test the connection to ensure it works
  5. Save the credential for use in your workflows

Storing Data in PostgreSQL

PostgreSQL is one of the most popular choices for storing structured web scraping data. Here's how to set up a complete workflow:

Step 1: Create Your Database Table

First, create a table to store your scraped data:

CREATE TABLE scraped_products (
    id SERIAL PRIMARY KEY,
    product_name VARCHAR(255),
    price DECIMAL(10, 2),
    description TEXT,
    url VARCHAR(500),
    scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Step 2: Build the n8n Workflow

  1. HTTP Request Node or HTML Extract Node: Scrape your target website
  2. Code Node: Transform and clean the scraped data
  3. Postgres Node: Insert data into your database

Here's an example Code Node that prepares data for PostgreSQL:

// Transform scraped data for database insertion
const items = $input.all();
const results = [];

for (const item of items) {
  const data = item.json;

  results.push({
    product_name: data.title || 'N/A',
    price: parseFloat(data.price?.replace(/[^0-9.]/g, '')) || 0,
    description: data.description?.substring(0, 1000) || '',
    url: data.url || '',
    scraped_at: new Date().toISOString()
  });
}

return results.map(result => ({ json: result }));

Step 3: Configure the Postgres Node

In the Postgres node: - Operation: Insert - Table: scraped_products - Columns: Map your JSON fields to database columns - Return Fields: Select which fields to return after insertion

For batch inserts, enable Insert Multiple Rows and pass an array of objects.

Storing Data in MySQL

MySQL setup is similar to PostgreSQL. Here's a complete example:

Create MySQL Table

CREATE TABLE scraped_articles (
    id INT AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    author VARCHAR(100),
    content TEXT,
    published_date DATE,
    source_url VARCHAR(500),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_published_date (published_date),
    INDEX idx_source_url (source_url(255))
);

MySQL Workflow Configuration

// Code Node: Prepare data for MySQL
const scrapedData = $input.all();

return scrapedData.map(item => {
  const data = item.json;

  return {
    json: {
      title: data.title?.trim() || '',
      author: data.author?.trim() || 'Unknown',
      content: data.content || '',
      published_date: data.date ? new Date(data.date).toISOString().split('T')[0] : null,
      source_url: data.url || ''
    }
  };
});

In the MySQL node, use the Execute Query operation for more control:

INSERT INTO scraped_articles (title, author, content, published_date, source_url)
VALUES (?, ?, ?, ?, ?)
ON DUPLICATE KEY UPDATE
    content = VALUES(content),
    updated_at = CURRENT_TIMESTAMP;

Storing Data in MongoDB

MongoDB is excellent for storing unstructured or semi-structured scraped data:

MongoDB Workflow Example

// Code Node: Prepare data for MongoDB
const items = $input.all();

return items.map(item => {
  const data = item.json;

  return {
    json: {
      title: data.title,
      metadata: {
        price: data.price,
        availability: data.availability,
        rating: data.rating
      },
      images: data.images || [],
      tags: data.tags || [],
      scrapedAt: new Date(),
      source: {
        url: data.url,
        domain: new URL(data.url).hostname
      }
    }
  };
});

In the MongoDB node: - Operation: Insert Many - Collection: your_collection_name - Options: Enable ordered inserts for better error handling

Advanced Database Operations

Handling Duplicate Data

Use upsert operations to avoid duplicate entries:

PostgreSQL Upsert Example:

INSERT INTO scraped_products (url, product_name, price, description)
VALUES ($1, $2, $3, $4)
ON CONFLICT (url)
DO UPDATE SET
    product_name = EXCLUDED.product_name,
    price = EXCLUDED.price,
    description = EXCLUDED.description,
    updated_at = CURRENT_TIMESTAMP;

Batch Processing for Performance

When scraping large datasets, use batch inserts:

// Code Node: Batch data into groups of 100
const items = $input.all();
const batchSize = 100;
const batches = [];

for (let i = 0; i < items.length; i += batchSize) {
  batches.push({
    json: {
      records: items.slice(i, i + batchSize).map(item => item.json)
    }
  });
}

return batches;

Error Handling and Logging

Implement proper error handling to avoid data loss:

// Code Node: Error handling wrapper
try {
  const items = $input.all();
  const validItems = [];
  const errors = [];

  for (const item of items) {
    try {
      // Validate required fields
      if (!item.json.url || !item.json.title) {
        throw new Error('Missing required fields');
      }

      validItems.push(item);
    } catch (error) {
      errors.push({
        data: item.json,
        error: error.message,
        timestamp: new Date().toISOString()
      });
    }
  }

  // Log errors to a separate table or file
  if (errors.length > 0) {
    console.error('Validation errors:', errors);
  }

  return validItems;
} catch (error) {
  throw new Error(`Data processing failed: ${error.message}`);
}

Using Cloud Database Services

Storing Data in Supabase

Supabase provides a PostgreSQL database with a REST API:

  1. Add the Supabase node to your workflow
  2. Select Insert operation
  3. Choose your table
  4. Map scraped fields to table columns

Storing Data in Airtable

Airtable is great for non-technical users who need to view scraped data:

// Code Node: Format data for Airtable
const items = $input.all();

return items.map(item => {
  return {
    json: {
      fields: {
        'Product Name': item.json.title,
        'Price': item.json.price,
        'URL': item.json.url,
        'Status': 'New',
        'Scraped Date': new Date().toISOString()
      }
    }
  };
});

Complete Workflow Example

Here's a full n8n workflow that scrapes product data and stores it in PostgreSQL:

// Node 1: Schedule Trigger (runs daily at 9 AM)

// Node 2: HTTP Request
// URL: https://example.com/products
// Method: GET

// Node 3: HTML Extract
// Extraction Rules:
// - title: .product-title
// - price: .product-price
// - description: .product-description
// - image: .product-image@src

// Node 4: Code Node (Data Transformation)
const items = $input.all();

return items.map(item => {
  const data = item.json;

  return {
    json: {
      product_name: data.title?.trim() || '',
      price: parseFloat(data.price?.replace(/[^0-9.]/g, '')) || 0,
      description: data.description?.trim() || '',
      image_url: data.image || '',
      source_url: 'https://example.com/products',
      scraped_at: new Date().toISOString()
    }
  };
});

// Node 5: Postgres Node
// Operation: Insert Multiple
// Table: scraped_products
// Data Mode: Auto-map Input Data

Best Practices for Database Storage

  1. Create Indexes: Index frequently queried columns (URLs, dates) for faster lookups
  2. Use Timestamps: Always store when data was scraped for tracking freshness
  3. Validate Data: Check for required fields before insertion to avoid database errors
  4. Handle Duplicates: Implement upsert logic to update existing records
  5. Batch Operations: Insert multiple records at once for better performance
  6. Error Logging: Store failed inserts in a separate error table
  7. Data Normalization: Keep your database schema normalized for relational data
  8. Backup Regularly: Schedule automated backups of your scraped data
  9. Monitor Performance: Track insertion times and optimize slow queries
  10. Clean Old Data: Implement data retention policies to manage database size

Integrating with Web Scraping APIs

For more reliable scraping, consider using specialized web scraping APIs that handle anti-bot measures and provide structured data. When handling authentication in web scraping workflows, you'll need to properly manage sessions before storing data.

You can also enhance your n8n workflows by monitoring network requests during scraping to ensure you're capturing all necessary data before database insertion.

Troubleshooting Common Issues

Connection Timeouts

If database inserts are timing out: - Reduce batch size - Increase connection timeout in credentials - Check database server load

Duplicate Key Errors

Implement proper upsert logic or use unique constraints:

ALTER TABLE scraped_products
ADD CONSTRAINT unique_url UNIQUE (url);

Data Type Mismatches

Ensure your data transformation matches database column types:

// Convert strings to appropriate types
price: parseFloat(data.price) || 0,
quantity: parseInt(data.quantity) || 0,
is_available: data.availability === 'In Stock',
scraped_at: new Date().toISOString()

Conclusion

Storing web scraping results in a database using n8n is straightforward with the right workflow design. Choose the database that best fits your needs—PostgreSQL or MySQL for structured data, MongoDB for flexible schemas, or cloud services like Supabase for managed hosting. Always implement error handling, data validation, and batch processing to build robust and scalable scraping pipelines.

By following these best practices and examples, you can create reliable data collection systems that automatically scrape websites and store results in your database of choice, making the data easily accessible for analysis, reporting, and application integration.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon