How do I Execute Custom JavaScript Code on a Page with Puppeteer-Sharp?
Executing custom JavaScript code on web pages is one of the most powerful features of Puppeteer-Sharp. This capability allows you to manipulate DOM elements, extract data, simulate user interactions, and perform complex operations directly within the browser context. In this comprehensive guide, we'll explore various methods to execute JavaScript code using Puppeteer-Sharp.
Overview of JavaScript Execution Methods
Puppeteer-Sharp provides several methods for executing JavaScript code on a page:
EvaluateExpressionAsync()
- Execute simple JavaScript expressionsEvaluateFunctionAsync()
- Execute JavaScript functions with parametersQuerySelectorAsync()
andQuerySelectorAllAsync()
- Execute JavaScript to select DOM elementsEvaluateOnSelectorAsync()
- Execute JavaScript on specific elements
Basic JavaScript Execution with EvaluateExpressionAsync
The EvaluateExpressionAsync
method is perfect for executing simple JavaScript expressions and retrieving their results:
using PuppeteerSharp;
// Launch browser and create page
var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true
});
var page = await browser.NewPageAsync();
// Navigate to a webpage
await page.GoToAsync("https://example.com");
// Execute simple JavaScript expressions
var title = await page.EvaluateExpressionAsync<string>("document.title");
var url = await page.EvaluateExpressionAsync<string>("window.location.href");
var userAgent = await page.EvaluateExpressionAsync<string>("navigator.userAgent");
Console.WriteLine($"Title: {title}");
Console.WriteLine($"URL: {url}");
Console.WriteLine($"User Agent: {userAgent}");
await browser.CloseAsync();
Advanced JavaScript Execution with EvaluateFunctionAsync
For more complex operations, use EvaluateFunctionAsync
to execute JavaScript functions with parameters:
// Execute a JavaScript function with parameters
var result = await page.EvaluateFunctionAsync<string>(@"
(selector, attribute) => {
const element = document.querySelector(selector);
return element ? element.getAttribute(attribute) : null;
}
", "meta[name='description']", "content");
Console.WriteLine($"Meta description: {result}");
// Execute a function that returns complex data
var pageInfo = await page.EvaluateFunctionAsync<dynamic>(@"
() => {
return {
title: document.title,
url: window.location.href,
linkCount: document.querySelectorAll('a').length,
imageCount: document.querySelectorAll('img').length,
viewport: {
width: window.innerWidth,
height: window.innerHeight
}
};
}
");
Console.WriteLine($"Page has {pageInfo.linkCount} links and {pageInfo.imageCount} images");
DOM Manipulation and Data Extraction
One of the most common use cases is manipulating the DOM and extracting data. Here are practical examples:
// Extract all links from the page
var links = await page.EvaluateFunctionAsync<string[]>(@"
() => {
return Array.from(document.querySelectorAll('a[href]'))
.map(link => ({
text: link.textContent.trim(),
href: link.href,
target: link.target || '_self'
}));
}
");
// Extract table data
var tableData = await page.EvaluateFunctionAsync<object[]>(@"
(tableSelector) => {
const table = document.querySelector(tableSelector);
if (!table) return [];
const rows = Array.from(table.querySelectorAll('tr'));
return rows.map(row => {
const cells = Array.from(row.querySelectorAll('td, th'));
return cells.map(cell => cell.textContent.trim());
});
}
", "table.data-table");
// Modify page content
await page.EvaluateFunctionAsync(@"
(message) => {
const banner = document.createElement('div');
banner.style.cssText = `
position: fixed;
top: 0;
left: 0;
right: 0;
background: #007bff;
color: white;
padding: 10px;
text-align: center;
z-index: 9999;
`;
banner.textContent = message;
document.body.prepend(banner);
}
", "This page is being automated!");
Handling Asynchronous Operations
When dealing with asynchronous JavaScript operations, you can execute async functions and handle promises:
// Execute async JavaScript function
var data = await page.EvaluateFunctionAsync<dynamic>(@"
async () => {
// Wait for an element to appear
const waitForElement = (selector, timeout = 5000) => {
return new Promise((resolve, reject) => {
const element = document.querySelector(selector);
if (element) {
resolve(element);
return;
}
const observer = new MutationObserver(() => {
const element = document.querySelector(selector);
if (element) {
observer.disconnect();
resolve(element);
}
});
observer.observe(document.body, {
childList: true,
subtree: true
});
setTimeout(() => {
observer.disconnect();
reject(new Error('Timeout waiting for element'));
}, timeout);
});
};
try {
await waitForElement('.dynamic-content');
return {
success: true,
content: document.querySelector('.dynamic-content').textContent
};
} catch (error) {
return {
success: false,
error: error.message
};
}
}
");
Working with Forms and User Input
Execute JavaScript to interact with forms and simulate user input:
// Fill out and submit a form
await page.EvaluateFunctionAsync(@"
(formData) => {
const form = document.querySelector('#contact-form');
if (!form) return false;
// Fill form fields
Object.keys(formData).forEach(key => {
const field = form.querySelector(`[name='${key}']`);
if (field) {
if (field.type === 'checkbox' || field.type === 'radio') {
field.checked = formData[key];
} else {
field.value = formData[key];
}
// Trigger change event
field.dispatchEvent(new Event('change', { bubbles: true }));
}
});
return true;
}
", new {
name = "John Doe",
email = "john@example.com",
message = "Hello from Puppeteer-Sharp!"
});
// Validate form before submission
var isValid = await page.EvaluateFunctionAsync<bool>(@"
() => {
const form = document.querySelector('#contact-form');
if (!form) return false;
// Check HTML5 validation
if (!form.checkValidity()) {
form.reportValidity();
return false;
}
// Custom validation
const email = form.querySelector('[name=email]').value;
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
return emailRegex.test(email);
}
");
Error Handling and Debugging
Implement proper error handling when executing JavaScript code:
try
{
var result = await page.EvaluateFunctionAsync<dynamic>(@"
() => {
// Potentially problematic code
const element = document.querySelector('#non-existent');
return element.textContent; // This could throw an error
}
");
}
catch (EvaluationFailedException ex)
{
Console.WriteLine($"JavaScript execution failed: {ex.Message}");
// Get more detailed error information
var errorDetails = await page.EvaluateFunctionAsync<string>(@"
() => {
try {
const element = document.querySelector('#non-existent');
return element.textContent;
} catch (error) {
return `Error: ${error.name} - ${error.message}`;
}
}
");
Console.WriteLine($"Error details: {errorDetails}");
}
// Safe execution with error handling in JavaScript
var safeResult = await page.EvaluateFunctionAsync<dynamic>(@"
(selector) => {
try {
const elements = document.querySelectorAll(selector);
return {
success: true,
count: elements.length,
data: Array.from(elements).map(el => el.textContent.trim())
};
} catch (error) {
return {
success: false,
error: error.message,
count: 0,
data: []
};
}
}
", ".item");
Performance Optimization Tips
When executing JavaScript code frequently, consider these optimization strategies:
// Pre-compile frequently used functions
var extractDataFunction = @"
() => {
return Array.from(document.querySelectorAll('.product')).map(product => ({
name: product.querySelector('.name')?.textContent?.trim(),
price: product.querySelector('.price')?.textContent?.trim(),
image: product.querySelector('img')?.src
}));
}
";
// Use the same function multiple times without recompilation overhead
var products1 = await page.EvaluateFunctionAsync<dynamic[]>(extractDataFunction);
await page.GoToAsync("https://example.com/page2");
var products2 = await page.EvaluateFunctionAsync<dynamic[]>(extractDataFunction);
// Batch operations to reduce round trips
var allData = await page.EvaluateFunctionAsync<dynamic>(@"
() => {
return {
products: Array.from(document.querySelectorAll('.product')).map(p => ({
name: p.querySelector('.name')?.textContent?.trim(),
price: p.querySelector('.price')?.textContent?.trim()
})),
categories: Array.from(document.querySelectorAll('.category')).map(c =>
c.textContent.trim()
),
pagination: {
current: document.querySelector('.current-page')?.textContent,
total: document.querySelector('.total-pages')?.textContent
}
};
}
");
Integration with Web Scraping Workflows
When building comprehensive web scraping solutions, JavaScript execution becomes even more powerful when combined with other Puppeteer-Sharp features. For instance, you might want to handle AJAX requests using Puppeteer after executing custom JavaScript, or inject JavaScript into a page using Puppeteer for persistent functionality across page navigations.
Best Practices and Considerations
Type Safety: Always specify the expected return type when using
EvaluateFunctionAsync<T>()
to ensure proper deserialization.Error Handling: Wrap JavaScript execution in try-catch blocks both in C# and JavaScript code.
Performance: Minimize the number of evaluation calls by batching operations when possible.
Security: Be cautious when executing user-provided JavaScript code to prevent XSS attacks.
Debugging: Use
Console.WriteLine()
within your JavaScript functions for debugging purposes.
Executing custom JavaScript code with Puppeteer-Sharp opens up endless possibilities for web automation, data extraction, and browser manipulation. By mastering these techniques, you can build robust and efficient web scraping solutions that handle even the most complex dynamic websites.