Table of contents

How do I handle file uploads with Headless Chromium?

File uploads in headless browsers require special handling since there's no user interface to interact with file dialogs. Headless Chromium provides several methods to programmatically handle file uploads through various automation libraries like Puppeteer, Playwright, and direct Chrome DevTools Protocol (CDP) usage.

Understanding File Upload Mechanisms

When working with headless Chromium, file uploads can be handled in three main ways:

  1. Input Element Approach: Directly setting files on <input type="file"> elements
  2. File Chooser Interception: Intercepting file chooser dialogs
  3. Drag and Drop: Simulating drag-and-drop file operations

Using Puppeteer for File Uploads

Basic File Upload with Input Elements

The most straightforward method is to use the uploadFile() method on file input elements:

const puppeteer = require('puppeteer');
const path = require('path');

async function uploadFileWithPuppeteer() {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();

  await page.goto('https://example.com/upload');

  // Select the file input element
  const fileInput = await page.$('input[type="file"]');

  // Upload file using absolute path
  const filePath = path.resolve(__dirname, 'test-file.pdf');
  await fileInput.uploadFile(filePath);

  // Submit the form
  await page.click('button[type="submit"]');

  // Wait for upload completion
  await page.waitForSelector('.upload-success', { timeout: 30000 });

  await browser.close();
}

Multiple File Upload

For multiple file uploads, pass an array of file paths:

async function uploadMultipleFiles() {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();

  await page.goto('https://example.com/multi-upload');

  const fileInput = await page.$('input[type="file"][multiple]');

  const filePaths = [
    path.resolve(__dirname, 'document1.pdf'),
    path.resolve(__dirname, 'image1.jpg'),
    path.resolve(__dirname, 'spreadsheet1.xlsx')
  ];

  await fileInput.uploadFile(...filePaths);

  // Process upload
  await page.click('#upload-button');

  await browser.close();
}

Handling File Chooser Dialogs

Some websites trigger file chooser dialogs that need to be intercepted:

async function handleFileChooserDialog() {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();

  // Set up file chooser handler before navigating
  page.on('filechooser', async (fileChooser) => {
    const filePath = path.resolve(__dirname, 'upload-file.txt');
    await fileChooser.accept([filePath]);
  });

  await page.goto('https://example.com/upload');

  // Click button that triggers file chooser
  await page.click('#browse-files');

  // Continue with form submission
  await page.click('#submit-upload');

  await browser.close();
}

Using Playwright for File Uploads

Playwright offers similar functionality with slightly different syntax:

const { chromium } = require('playwright');

async function uploadWithPlaywright() {
  const browser = await chromium.launch({ headless: true });
  const page = await browser.newPage();

  await page.goto('https://example.com/upload');

  // Set files on input element
  await page.setInputFiles('input[type="file"]', [
    path.resolve(__dirname, 'file1.pdf'),
    path.resolve(__dirname, 'file2.jpg')
  ]);

  // Submit form
  await page.click('button[type="submit"]');

  // Wait for upload completion
  await page.waitForSelector('.success-message');

  await browser.close();
}

Handling Dynamic File Choosers in Playwright

async function handleDynamicFileChooser() {
  const browser = await chromium.launch();
  const page = await browser.newPage();

  await page.goto('https://example.com/dynamic-upload');

  // Start waiting for file chooser before clicking
  const fileChooserPromise = page.waitForEvent('filechooser');

  await page.click('#upload-trigger');

  const fileChooser = await fileChooserPromise;
  await fileChooser.setFiles([
    path.resolve(__dirname, 'document.pdf')
  ]);

  await browser.close();
}

Advanced File Upload Scenarios

Validating File Upload Progress

Monitor upload progress and handle potential errors:

async function monitorUploadProgress() {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();

  // Enable request interception to monitor upload
  await page.setRequestInterception(true);

  page.on('request', request => {
    if (request.url().includes('/upload')) {
      console.log('Upload request initiated:', request.method());
    }
    request.continue();
  });

  page.on('response', response => {
    if (response.url().includes('/upload')) {
      console.log('Upload response:', response.status());
    }
  });

  await page.goto('https://example.com/upload');

  const fileInput = await page.$('input[type="file"]');
  await fileInput.uploadFile(path.resolve(__dirname, 'large-file.zip'));

  await page.click('#submit-button');

  // Wait for progress bar completion
  await page.waitForFunction(() => {
    const progressBar = document.querySelector('.progress-bar');
    return progressBar && progressBar.style.width === '100%';
  }, { timeout: 60000 });

  await browser.close();
}

Handling Different File Types and Validation

async function handleFileTypeValidation() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('https://example.com/restricted-upload');

  try {
    // Attempt to upload restricted file type
    const fileInput = await page.$('input[type="file"]');
    await fileInput.uploadFile(path.resolve(__dirname, 'script.exe'));

    await page.click('#upload-button');

    // Check for validation error
    const errorMessage = await page.waitForSelector('.error-message', { 
      timeout: 5000 
    });

    if (errorMessage) {
      const errorText = await errorMessage.textContent();
      console.log('Upload validation error:', errorText);
    }

  } catch (error) {
    console.log('File upload failed:', error.message);
  }

  await browser.close();
}

Working with Cloud Storage Uploads

Many modern applications upload directly to cloud storage. Here's how to handle those scenarios:

async function handleCloudStorageUpload() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Monitor network requests to cloud storage
  const uploadRequests = [];

  page.on('request', request => {
    if (request.url().includes('s3.amazonaws.com') || 
        request.url().includes('storage.googleapis.com')) {
      uploadRequests.push(request.url());
    }
  });

  await page.goto('https://example.com/cloud-upload');

  const fileInput = await page.$('input[type="file"]');
  await fileInput.uploadFile(path.resolve(__dirname, 'document.pdf'));

  // Trigger cloud upload
  await page.click('#cloud-upload-button');

  // Wait for cloud upload completion
  await page.waitForFunction(() => {
    return document.querySelector('.upload-complete');
  }, { timeout: 30000 });

  console.log('Cloud storage requests:', uploadRequests);

  await browser.close();
}

Error Handling and Best Practices

Robust Error Handling

async function robustFileUpload() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  try {
    await page.goto('https://example.com/upload', { 
      waitUntil: 'networkidle2',
      timeout: 30000 
    });

    // Check if file input exists
    const fileInput = await page.$('input[type="file"]');
    if (!fileInput) {
      throw new Error('File input element not found');
    }

    // Verify file exists before uploading
    const filePath = path.resolve(__dirname, 'upload-file.pdf');
    if (!require('fs').existsSync(filePath)) {
      throw new Error(`File not found: ${filePath}`);
    }

    await fileInput.uploadFile(filePath);

    // Wait for any file validation
    await page.waitForTimeout(1000);

    // Check for client-side validation errors
    const validationError = await page.$('.file-error');
    if (validationError) {
      const errorText = await validationError.textContent();
      throw new Error(`File validation failed: ${errorText}`);
    }

    await page.click('#submit-button');

    // Wait for upload success or error
    await Promise.race([
      page.waitForSelector('.upload-success', { timeout: 30000 }),
      page.waitForSelector('.upload-error', { timeout: 30000 })
    ]);

    const success = await page.$('.upload-success');
    if (success) {
      console.log('File uploaded successfully');
    } else {
      const error = await page.$('.upload-error');
      const errorText = await error.textContent();
      throw new Error(`Upload failed: ${errorText}`);
    }

  } catch (error) {
    console.error('Upload process failed:', error.message);
    throw error;
  } finally {
    await browser.close();
  }
}

Performance Optimization

For large file uploads or multiple concurrent uploads:

async function optimizedBulkUpload() {
  const browser = await puppeteer.launch({
    headless: true,
    args: [
      '--no-sandbox',
      '--disable-dev-shm-usage',
      '--disable-background-networking',
      '--disable-background-timer-throttling'
    ]
  });

  const files = [
    'file1.pdf',
    'file2.jpg', 
    'file3.docx'
  ];

  const uploadPromises = files.map(async (filename) => {
    const page = await browser.newPage();

    try {
      await page.goto('https://example.com/upload');

      const fileInput = await page.$('input[type="file"]');
      await fileInput.uploadFile(path.resolve(__dirname, filename));

      await page.click('#submit-button');
      await page.waitForSelector('.upload-success', { timeout: 30000 });

      console.log(`${filename} uploaded successfully`);
      return { success: true, file: filename };

    } catch (error) {
      console.error(`Failed to upload ${filename}:`, error.message);
      return { success: false, file: filename, error: error.message };
    } finally {
      await page.close();
    }
  });

  const results = await Promise.all(uploadPromises);
  await browser.close();

  return results;
}

Integration with Testing Frameworks

File upload testing can be integrated with popular testing frameworks. When working with complex upload workflows, you might also need to handle authentication in Puppeteer for secured upload endpoints.

// Jest test example
describe('File Upload Tests', () => {
  let browser, page;

  beforeAll(async () => {
    browser = await puppeteer.launch();
  });

  beforeEach(async () => {
    page = await browser.newPage();
  });

  afterEach(async () => {
    await page.close();
  });

  afterAll(async () => {
    await browser.close();
  });

  test('should upload PDF file successfully', async () => {
    await page.goto('https://example.com/upload');

    const fileInput = await page.$('input[type="file"]');
    await fileInput.uploadFile(path.resolve(__dirname, 'test.pdf'));

    await page.click('#upload-button');

    const successMessage = await page.waitForSelector('.success');
    expect(successMessage).toBeTruthy();
  });
});

Command Line Tools and Scripts

Create reusable upload scripts for different scenarios:

#!/bin/bash
# upload-script.sh

# Set file path
FILE_PATH="${1:-./default-file.pdf}"
UPLOAD_URL="${2:-https://example.com/upload}"

# Run headless upload
node -e "
const puppeteer = require('puppeteer');
const path = require('path');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('$UPLOAD_URL');

  const fileInput = await page.$('input[type=\"file\"]');
  await fileInput.uploadFile(path.resolve('$FILE_PATH'));

  await page.click('button[type=\"submit\"]');
  await page.waitForSelector('.upload-success');

  console.log('Upload completed successfully');
  await browser.close();
})();
"

Python Implementation with Selenium

For Python developers, Selenium provides similar functionality:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import os

def upload_file_with_selenium():
    # Configure Chrome options for headless mode
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")

    driver = webdriver.Chrome(options=chrome_options)

    try:
        driver.get("https://example.com/upload")

        # Find file input element
        file_input = driver.find_element(By.CSS_SELECTOR, "input[type='file']")

        # Get absolute file path
        file_path = os.path.abspath("test-file.pdf")

        # Upload file
        file_input.send_keys(file_path)

        # Submit form
        submit_button = driver.find_element(By.CSS_SELECTOR, "button[type='submit']")
        submit_button.click()

        # Wait for success message
        WebDriverWait(driver, 30).until(
            EC.presence_of_element_located((By.CLASS_NAME, "upload-success"))
        )

        print("File uploaded successfully")

    except Exception as e:
        print(f"Upload failed: {e}")

    finally:
        driver.quit()

# Multiple file upload in Python
def upload_multiple_files_selenium():
    chrome_options = Options()
    chrome_options.add_argument("--headless")

    driver = webdriver.Chrome(options=chrome_options)

    try:
        driver.get("https://example.com/multi-upload")

        file_input = driver.find_element(By.CSS_SELECTOR, "input[type='file'][multiple]")

        # Multiple files separated by newlines
        files = [
            os.path.abspath("file1.pdf"),
            os.path.abspath("file2.jpg"),
            os.path.abspath("file3.docx")
        ]

        file_input.send_keys("\n".join(files))

        # Submit upload
        driver.find_element(By.ID, "upload-button").click()

        # Wait for completion
        WebDriverWait(driver, 60).until(
            EC.presence_of_element_located((By.CLASS_NAME, "upload-complete"))
        )

    finally:
        driver.quit()

Troubleshooting Common Issues

File Path Resolution

Always use absolute paths and verify file existence:

const fs = require('fs');
const path = require('path');

function validateFilePath(relativePath) {
  const absolutePath = path.resolve(__dirname, relativePath);

  if (!fs.existsSync(absolutePath)) {
    throw new Error(`File not found: ${absolutePath}`);
  }

  const stats = fs.statSync(absolutePath);
  if (!stats.isFile()) {
    throw new Error(`Path is not a file: ${absolutePath}`);
  }

  return absolutePath;
}

// Usage
try {
  const filePath = validateFilePath('./upload-file.pdf');
  await fileInput.uploadFile(filePath);
} catch (error) {
  console.error('File validation failed:', error.message);
}

Memory Management for Large Files

For large file uploads, monitor memory usage and implement appropriate timeouts. You may also need to handle timeouts in Puppeteer specifically for upload operations.

async function handleLargeFileUpload() {
  const browser = await puppeteer.launch({
    args: ['--max-old-space-size=4096'] // Increase memory limit
  });

  const page = await browser.newPage();

  // Set longer timeouts for large files
  page.setDefaultTimeout(120000); // 2 minutes

  // Monitor memory usage
  const client = await page.target().createCDPSession();
  await client.send('Runtime.enable');

  // Your upload logic here

  await browser.close();
}

Handling Upload Progress Indicators

Many file upload interfaces show progress bars. Wait for these to complete:

async function waitForUploadProgress() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('https://example.com/upload');

  const fileInput = await page.$('input[type="file"]');
  await fileInput.uploadFile(path.resolve(__dirname, 'large-file.zip'));

  await page.click('#upload-button');

  // Wait for progress to reach 100%
  await page.waitForFunction(
    () => {
      const progress = document.querySelector('.progress-percentage');
      return progress && progress.textContent.includes('100%');
    },
    { timeout: 120000 }
  );

  // Additional wait for final processing
  await page.waitForSelector('.upload-complete', { timeout: 30000 });

  await browser.close();
}

Handling file uploads with headless Chromium requires careful consideration of the upload mechanism, proper error handling, and performance optimization. By following these patterns and best practices, you can build robust automation scripts that handle various file upload scenarios reliably. Whether you're using Puppeteer, Playwright, or Selenium, the key is understanding the specific upload flow of your target website and implementing appropriate wait conditions and error handling.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon