Table of contents

How do I handle file uploads and downloads with Playwright?

Handling File Uploads and Downloads in Playwright

Playwright provides robust APIs for handling file uploads and downloads in both Python and JavaScript. This guide covers all the essential patterns and best practices.

File Uploads

Python Examples

Basic File Upload (Sync API):

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto('https://example.com/upload')

    # Upload single file
    page.set_input_files('input[type="file"]', 'path/to/file.pdf')

    # Upload multiple files
    page.set_input_files('input[type="file"]', ['file1.jpg', 'file2.png'])

    browser.close()

Async File Upload:

import asyncio
from playwright.async_api import async_playwright

async def upload_file():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto('https://example.com/upload')

        # Set files and submit form
        await page.set_input_files('#file-input', 'document.pdf')
        await page.click('button[type="submit"]')

        # Wait for upload confirmation
        await page.wait_for_selector('.upload-success')

        await browser.close()

asyncio.run(upload_file())

Advanced Upload with Validation:

from playwright.sync_api import sync_playwright
import os

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto('https://example.com/upload')

    file_path = 'path/to/file.pdf'

    # Verify file exists before upload
    if os.path.exists(file_path):
        page.set_input_files('input[name="document"]', file_path)

        # Clear files if needed
        page.set_input_files('input[name="document"]', [])

        # Re-upload
        page.set_input_files('input[name="document"]', file_path)

    browser.close()

JavaScript Examples

Basic File Upload:

const { chromium } = require('playwright');

(async () => {
    const browser = await chromium.launch();
    const page = await browser.newPage();
    await page.goto('https://example.com/upload');

    // Upload single file
    await page.setInputFiles('input[type="file"]', '/path/to/file.pdf');

    // Upload multiple files
    await page.setInputFiles('input[type="file"]', [
        '/path/to/file1.jpg',
        '/path/to/file2.png'
    ]);

    await browser.close();
})();

Upload with Form Submission:

const { chromium } = require('playwright');
const path = require('path');

(async () => {
    const browser = await chromium.launch();
    const page = await browser.newPage();
    await page.goto('https://example.com/upload');

    const filePath = path.join(__dirname, 'test-file.pdf');

    await page.setInputFiles('#file-upload', filePath);
    await page.fill('#file-description', 'Test document');
    await page.click('button[type="submit"]');

    // Wait for upload completion
    await page.waitForSelector('.success-message');

    await browser.close();
})();

File Downloads

Python Examples

Basic Download (Sync API):

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto('https://example.com/download')

    # Start download and wait for it
    with page.expect_download() as download_info:
        page.click('a[href="/download/file.pdf"]')

    download = download_info.value

    # Save to specific location
    download.save_as('/path/to/downloaded-file.pdf')

    # Or get temporary path
    temp_path = download.path()
    print(f"Downloaded to: {temp_path}")

    browser.close()

Async Download with Custom Path:

import asyncio
from playwright.async_api import async_playwright

async def download_file():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto('https://example.com/download')

        # Start download
        async with page.expect_download() as download_info:
            await page.click('#download-button')

        download = await download_info.value

        # Get download info
        print(f"Filename: {download.suggested_filename}")
        print(f"Size: {await download.path()}")

        # Save to custom location
        await download.save_as('./downloads/my-file.pdf')

        await browser.close()

asyncio.run(download_file())

Multiple Downloads:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto('https://example.com/downloads')

    download_links = page.locator('a.download-link')

    for i in range(download_links.count()):
        with page.expect_download() as download_info:
            download_links.nth(i).click()

        download = download_info.value
        filename = download.suggested_filename or f"file_{i}.pdf"
        download.save_as(f"./downloads/{filename}")

    browser.close()

JavaScript Examples

Basic Download:

const { chromium } = require('playwright');

(async () => {
    const browser = await chromium.launch();
    const page = await browser.newPage();
    await page.goto('https://example.com/download');

    // Handle download
    const [download] = await Promise.all([
        page.waitForEvent('download'),
        page.click('a#download-link')
    ]);

    // Save to specific path
    await download.saveAs('./downloads/file.pdf');

    // Get download info
    console.log('Filename:', download.suggestedFilename());
    console.log('Temp path:', await download.path());

    await browser.close();
})();

Download with Progress Tracking:

const { chromium } = require('playwright');
const fs = require('fs');

(async () => {
    const browser = await chromium.launch();
    const page = await browser.newPage();
    await page.goto('https://example.com/download');

    const [download] = await Promise.all([
        page.waitForEvent('download'),
        page.click('button.download-btn')
    ]);

    // Stream download to file
    const stream = await download.createReadStream();
    const writeStream = fs.createWriteStream('./downloads/large-file.zip');

    stream.pipe(writeStream);

    await new Promise((resolve, reject) => {
        writeStream.on('finish', resolve);
        writeStream.on('error', reject);
    });

    console.log('Download completed');
    await browser.close();
})();

Best Practices and Tips

Error Handling

from playwright.sync_api import sync_playwright
import os

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()

    try:
        page.goto('https://example.com/upload')

        file_path = 'path/to/file.pdf'
        if not os.path.exists(file_path):
            raise FileNotFoundError(f"File not found: {file_path}")

        # Check file size before upload
        file_size = os.path.getsize(file_path)
        if file_size > 10 * 1024 * 1024:  # 10MB limit
            raise ValueError("File too large")

        page.set_input_files('input[type="file"]', file_path)

    except Exception as e:
        print(f"Upload failed: {e}")
    finally:
        browser.close()

Common Selectors and Patterns

# Different input file selectors
page.set_input_files('input[type="file"]', file_path)           # Basic
page.set_input_files('input[accept=".pdf,.doc"]', file_path)    # With accept
page.set_input_files('#file-upload', file_path)                # By ID
page.set_input_files('[data-testid="file-input"]', file_path)  # By test ID

# Clear file input
page.set_input_files('input[type="file"]', [])

# Multiple file upload
page.set_input_files('input[multiple]', ['file1.pdf', 'file2.jpg'])

Download Directory Configuration

const { chromium } = require('playwright');

(async () => {
    const browser = await chromium.launch();
    const context = await browser.newContext({
        acceptDownloads: true,
        // Set default download directory
        downloadsPath: './downloads'
    });

    const page = await context.newPage();
    await page.goto('https://example.com/download');

    const [download] = await Promise.all([
        page.waitForEvent('download'),
        page.click('a.download-link')
    ]);

    // File will be saved to ./downloads directory
    await download.saveAs(`./downloads/${download.suggestedFilename()}`);

    await browser.close();
})();

Common Issues and Solutions

  1. File Path Errors: Always use absolute paths or verify relative paths exist
  2. Download Timeouts: Increase timeout for large files: page.wait_for_download(timeout=60000)
  3. Multiple Files: Use arrays for multiple file uploads: ['file1.pdf', 'file2.jpg']
  4. File Permissions: Ensure read/write permissions on upload/download directories
  5. Browser Security: Some browsers block certain file types - check browser settings

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon