Table of contents

How do I Execute Custom JavaScript Code on a Page with Puppeteer-Sharp?

Executing custom JavaScript code on web pages is one of the most powerful features of Puppeteer-Sharp. This capability allows you to manipulate DOM elements, extract data, simulate user interactions, and perform complex operations directly within the browser context. In this comprehensive guide, we'll explore various methods to execute JavaScript code using Puppeteer-Sharp.

Overview of JavaScript Execution Methods

Puppeteer-Sharp provides several methods for executing JavaScript code on a page:

  • EvaluateExpressionAsync() - Execute simple JavaScript expressions
  • EvaluateFunctionAsync() - Execute JavaScript functions with parameters
  • QuerySelectorAsync() and QuerySelectorAllAsync() - Execute JavaScript to select DOM elements
  • EvaluateOnSelectorAsync() - Execute JavaScript on specific elements

Basic JavaScript Execution with EvaluateExpressionAsync

The EvaluateExpressionAsync method is perfect for executing simple JavaScript expressions and retrieving their results:

using PuppeteerSharp;

// Launch browser and create page
var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
    Headless = true
});
var page = await browser.NewPageAsync();

// Navigate to a webpage
await page.GoToAsync("https://example.com");

// Execute simple JavaScript expressions
var title = await page.EvaluateExpressionAsync<string>("document.title");
var url = await page.EvaluateExpressionAsync<string>("window.location.href");
var userAgent = await page.EvaluateExpressionAsync<string>("navigator.userAgent");

Console.WriteLine($"Title: {title}");
Console.WriteLine($"URL: {url}");
Console.WriteLine($"User Agent: {userAgent}");

await browser.CloseAsync();

Advanced JavaScript Execution with EvaluateFunctionAsync

For more complex operations, use EvaluateFunctionAsync to execute JavaScript functions with parameters:

// Execute a JavaScript function with parameters
var result = await page.EvaluateFunctionAsync<string>(@"
    (selector, attribute) => {
        const element = document.querySelector(selector);
        return element ? element.getAttribute(attribute) : null;
    }
", "meta[name='description']", "content");

Console.WriteLine($"Meta description: {result}");

// Execute a function that returns complex data
var pageInfo = await page.EvaluateFunctionAsync<dynamic>(@"
    () => {
        return {
            title: document.title,
            url: window.location.href,
            linkCount: document.querySelectorAll('a').length,
            imageCount: document.querySelectorAll('img').length,
            viewport: {
                width: window.innerWidth,
                height: window.innerHeight
            }
        };
    }
");

Console.WriteLine($"Page has {pageInfo.linkCount} links and {pageInfo.imageCount} images");

DOM Manipulation and Data Extraction

One of the most common use cases is manipulating the DOM and extracting data. Here are practical examples:

// Extract all links from the page
var links = await page.EvaluateFunctionAsync<string[]>(@"
    () => {
        return Array.from(document.querySelectorAll('a[href]'))
                   .map(link => ({
                       text: link.textContent.trim(),
                       href: link.href,
                       target: link.target || '_self'
                   }));
    }
");

// Extract table data
var tableData = await page.EvaluateFunctionAsync<object[]>(@"
    (tableSelector) => {
        const table = document.querySelector(tableSelector);
        if (!table) return [];

        const rows = Array.from(table.querySelectorAll('tr'));
        return rows.map(row => {
            const cells = Array.from(row.querySelectorAll('td, th'));
            return cells.map(cell => cell.textContent.trim());
        });
    }
", "table.data-table");

// Modify page content
await page.EvaluateFunctionAsync(@"
    (message) => {
        const banner = document.createElement('div');
        banner.style.cssText = `
            position: fixed;
            top: 0;
            left: 0;
            right: 0;
            background: #007bff;
            color: white;
            padding: 10px;
            text-align: center;
            z-index: 9999;
        `;
        banner.textContent = message;
        document.body.prepend(banner);
    }
", "This page is being automated!");

Handling Asynchronous Operations

When dealing with asynchronous JavaScript operations, you can execute async functions and handle promises:

// Execute async JavaScript function
var data = await page.EvaluateFunctionAsync<dynamic>(@"
    async () => {
        // Wait for an element to appear
        const waitForElement = (selector, timeout = 5000) => {
            return new Promise((resolve, reject) => {
                const element = document.querySelector(selector);
                if (element) {
                    resolve(element);
                    return;
                }

                const observer = new MutationObserver(() => {
                    const element = document.querySelector(selector);
                    if (element) {
                        observer.disconnect();
                        resolve(element);
                    }
                });

                observer.observe(document.body, {
                    childList: true,
                    subtree: true
                });

                setTimeout(() => {
                    observer.disconnect();
                    reject(new Error('Timeout waiting for element'));
                }, timeout);
            });
        };

        try {
            await waitForElement('.dynamic-content');
            return {
                success: true,
                content: document.querySelector('.dynamic-content').textContent
            };
        } catch (error) {
            return {
                success: false,
                error: error.message
            };
        }
    }
");

Working with Forms and User Input

Execute JavaScript to interact with forms and simulate user input:

// Fill out and submit a form
await page.EvaluateFunctionAsync(@"
    (formData) => {
        const form = document.querySelector('#contact-form');
        if (!form) return false;

        // Fill form fields
        Object.keys(formData).forEach(key => {
            const field = form.querySelector(`[name='${key}']`);
            if (field) {
                if (field.type === 'checkbox' || field.type === 'radio') {
                    field.checked = formData[key];
                } else {
                    field.value = formData[key];
                }

                // Trigger change event
                field.dispatchEvent(new Event('change', { bubbles: true }));
            }
        });

        return true;
    }
", new {
    name = "John Doe",
    email = "john@example.com",
    message = "Hello from Puppeteer-Sharp!"
});

// Validate form before submission
var isValid = await page.EvaluateFunctionAsync<bool>(@"
    () => {
        const form = document.querySelector('#contact-form');
        if (!form) return false;

        // Check HTML5 validation
        if (!form.checkValidity()) {
            form.reportValidity();
            return false;
        }

        // Custom validation
        const email = form.querySelector('[name=email]').value;
        const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;

        return emailRegex.test(email);
    }
");

Error Handling and Debugging

Implement proper error handling when executing JavaScript code:

try
{
    var result = await page.EvaluateFunctionAsync<dynamic>(@"
        () => {
            // Potentially problematic code
            const element = document.querySelector('#non-existent');
            return element.textContent; // This could throw an error
        }
    ");
}
catch (EvaluationFailedException ex)
{
    Console.WriteLine($"JavaScript execution failed: {ex.Message}");

    // Get more detailed error information
    var errorDetails = await page.EvaluateFunctionAsync<string>(@"
        () => {
            try {
                const element = document.querySelector('#non-existent');
                return element.textContent;
            } catch (error) {
                return `Error: ${error.name} - ${error.message}`;
            }
        }
    ");

    Console.WriteLine($"Error details: {errorDetails}");
}

// Safe execution with error handling in JavaScript
var safeResult = await page.EvaluateFunctionAsync<dynamic>(@"
    (selector) => {
        try {
            const elements = document.querySelectorAll(selector);
            return {
                success: true,
                count: elements.length,
                data: Array.from(elements).map(el => el.textContent.trim())
            };
        } catch (error) {
            return {
                success: false,
                error: error.message,
                count: 0,
                data: []
            };
        }
    }
", ".item");

Performance Optimization Tips

When executing JavaScript code frequently, consider these optimization strategies:

// Pre-compile frequently used functions
var extractDataFunction = @"
    () => {
        return Array.from(document.querySelectorAll('.product')).map(product => ({
            name: product.querySelector('.name')?.textContent?.trim(),
            price: product.querySelector('.price')?.textContent?.trim(),
            image: product.querySelector('img')?.src
        }));
    }
";

// Use the same function multiple times without recompilation overhead
var products1 = await page.EvaluateFunctionAsync<dynamic[]>(extractDataFunction);
await page.GoToAsync("https://example.com/page2");
var products2 = await page.EvaluateFunctionAsync<dynamic[]>(extractDataFunction);

// Batch operations to reduce round trips
var allData = await page.EvaluateFunctionAsync<dynamic>(@"
    () => {
        return {
            products: Array.from(document.querySelectorAll('.product')).map(p => ({
                name: p.querySelector('.name')?.textContent?.trim(),
                price: p.querySelector('.price')?.textContent?.trim()
            })),
            categories: Array.from(document.querySelectorAll('.category')).map(c => 
                c.textContent.trim()
            ),
            pagination: {
                current: document.querySelector('.current-page')?.textContent,
                total: document.querySelector('.total-pages')?.textContent
            }
        };
    }
");

Integration with Web Scraping Workflows

When building comprehensive web scraping solutions, JavaScript execution becomes even more powerful when combined with other Puppeteer-Sharp features. For instance, you might want to handle AJAX requests using Puppeteer after executing custom JavaScript, or inject JavaScript into a page using Puppeteer for persistent functionality across page navigations.

Best Practices and Considerations

  1. Type Safety: Always specify the expected return type when using EvaluateFunctionAsync<T>() to ensure proper deserialization.

  2. Error Handling: Wrap JavaScript execution in try-catch blocks both in C# and JavaScript code.

  3. Performance: Minimize the number of evaluation calls by batching operations when possible.

  4. Security: Be cautious when executing user-provided JavaScript code to prevent XSS attacks.

  5. Debugging: Use Console.WriteLine() within your JavaScript functions for debugging purposes.

Executing custom JavaScript code with Puppeteer-Sharp opens up endless possibilities for web automation, data extraction, and browser manipulation. By mastering these techniques, you can build robust and efficient web scraping solutions that handle even the most complex dynamic websites.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon