Can I use Puppeteer-Sharp to generate PDFs from web pages?
Yes, Puppeteer-Sharp provides excellent support for generating PDFs from web pages. As the .NET port of the popular Puppeteer library, Puppeteer-Sharp includes robust PDF generation capabilities that allow you to convert any web page into a high-quality PDF document programmatically.
Getting Started with PDF Generation
First, ensure you have Puppeteer-Sharp installed in your .NET project:
dotnet add package PuppeteerSharp
Here's a basic example of generating a PDF from a web page:
using PuppeteerSharp;
using System;
using System.Threading.Tasks;
class Program
{
static async Task Main(string[] args)
{
// Download Chromium if not already present
await new BrowserFetcher().DownloadAsync();
// Launch browser
var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true
});
// Create new page
var page = await browser.NewPageAsync();
// Navigate to the target URL
await page.GoToAsync("https://example.com");
// Generate PDF
await page.PdfAsync("example.pdf");
// Clean up
await browser.CloseAsync();
Console.WriteLine("PDF generated successfully!");
}
}
Advanced PDF Configuration Options
Puppeteer-Sharp offers extensive customization options through the PdfOptions
class:
var pdfOptions = new PdfOptions
{
// Page format (A4, Letter, Legal, etc.)
Format = PaperFormat.A4,
// Custom page dimensions (overrides format)
Width = "8.5in",
Height = "11in",
// Margins
MarginOptions = new MarginOptions
{
Top = "1in",
Bottom = "1in",
Left = "0.5in",
Right = "0.5in"
},
// Print background graphics
PrintBackground = true,
// Landscape orientation
Landscape = false,
// Scale factor (0.1 to 2.0)
Scale = 1.0m,
// Page ranges to print
PageRanges = "1-3,5",
// Display header and footer
DisplayHeaderFooter = true,
HeaderTemplate = "<div style='font-size:10px; text-align:center; width:100%;'>Header Content</div>",
FooterTemplate = "<div style='font-size:10px; text-align:center; width:100%;'>Page <span class='pageNumber'></span> of <span class='totalPages'></span></div>",
// Prefer CSS page size
PreferCSSPageSize = true
};
await page.PdfAsync("custom-pdf.pdf", pdfOptions);
Handling Dynamic Content
When generating PDFs from dynamic web pages, you may need to wait for content to load fully. Similar to how you handle AJAX requests using Puppeteer, you can use various waiting strategies:
// Wait for specific element to appear
await page.WaitForSelectorAsync("#content-loaded");
// Wait for network to be idle
await page.GoToAsync("https://dynamic-site.com",
new NavigationOptions
{
WaitUntil = new[] { WaitUntilNavigation.Networkidle0 }
});
// Wait for specific timeout
await page.WaitForTimeoutAsync(2000);
// Generate PDF after content is ready
await page.PdfAsync("dynamic-content.pdf", pdfOptions);
Creating PDFs from HTML Strings
You can also generate PDFs from HTML content directly:
string htmlContent = @"
<!DOCTYPE html>
<html>
<head>
<title>Generated Document</title>
<style>
body { font-family: Arial, sans-serif; margin: 40px; }
h1 { color: #333; }
.highlight { background-color: #ffff99; }
</style>
</head>
<body>
<h1>Report Title</h1>
<p>This is a <span class='highlight'>dynamically generated</span> PDF document.</p>
<table border='1' style='width:100%; border-collapse: collapse;'>
<tr><th>Column 1</th><th>Column 2</th></tr>
<tr><td>Data 1</td><td>Data 2</td></tr>
</table>
</body>
</html>";
await page.SetContentAsync(htmlContent);
await page.PdfAsync("from-html.pdf", pdfOptions);
Handling Authentication and Cookies
For protected pages, you can set authentication headers or cookies before generating the PDF:
// Set authentication header
await page.SetExtraHttpHeadersAsync(new Dictionary<string, string>
{
{"Authorization", "Bearer your-token-here"}
});
// Set cookies
await page.SetCookieAsync(new CookieParam
{
Name = "session_id",
Value = "your-session-value",
Domain = "example.com"
});
// Navigate and generate PDF
await page.GoToAsync("https://protected-site.com/report");
await page.PdfAsync("protected-content.pdf");
Viewport and Responsive Design Considerations
Similar to how you can set viewport in Puppeteer, controlling the viewport is crucial for consistent PDF generation:
// Set viewport before navigation
await page.SetViewportAsync(new ViewPortOptions
{
Width = 1200,
Height = 800,
DeviceScaleFactor = 1
});
// Emulate specific device
await page.EmulateAsync(DeviceDescriptors.Get("iPad Pro"));
await page.GoToAsync("https://responsive-site.com");
await page.PdfAsync("responsive-design.pdf");
Error Handling and Best Practices
Implement proper error handling for robust PDF generation:
public async Task<bool> GeneratePdfAsync(string url, string outputPath)
{
Browser browser = null;
try
{
await new BrowserFetcher().DownloadAsync();
browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true,
Args = new[] { "--no-sandbox", "--disable-setuid-sandbox" } // For server environments
});
var page = await browser.NewPageAsync();
// Set longer timeout for complex pages
page.DefaultTimeout = 30000;
var response = await page.GoToAsync(url, new NavigationOptions
{
WaitUntil = new[] { WaitUntilNavigation.Networkidle2 },
Timeout = 30000
});
if (!response.Ok)
{
throw new Exception($"Failed to load page: {response.Status}");
}
await page.PdfAsync(outputPath, new PdfOptions
{
Format = PaperFormat.A4,
PrintBackground = true,
MarginOptions = new MarginOptions
{
Top = "1cm",
Bottom = "1cm",
Left = "1cm",
Right = "1cm"
}
});
return true;
}
catch (Exception ex)
{
Console.WriteLine($"PDF generation failed: {ex.Message}");
return false;
}
finally
{
if (browser != null)
{
await browser.CloseAsync();
}
}
}
Performance Optimization
For high-volume PDF generation, consider these optimization strategies:
public class PdfService : IDisposable
{
private Browser _browser;
private readonly SemaphoreSlim _semaphore;
public PdfService(int maxConcurrency = 5)
{
_semaphore = new SemaphoreSlim(maxConcurrency, maxConcurrency);
}
public async Task InitializeAsync()
{
await new BrowserFetcher().DownloadAsync();
_browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true,
Args = new[]
{
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-dev-shm-usage" // Prevent memory issues
}
});
}
public async Task<byte[]> GeneratePdfBytesAsync(string url)
{
await _semaphore.WaitAsync();
try
{
var page = await _browser.NewPageAsync();
try
{
await page.GoToAsync(url);
var pdfBytes = await page.PdfDataAsync(new PdfOptions
{
Format = PaperFormat.A4,
PrintBackground = true
});
return pdfBytes;
}
finally
{
await page.CloseAsync();
}
}
finally
{
_semaphore.Release();
}
}
public void Dispose()
{
_browser?.CloseAsync().Wait();
_semaphore?.Dispose();
}
}
Server Environment Considerations
When deploying PDF generation in server environments, especially Docker containers, you may need additional configuration:
var launchOptions = new LaunchOptions
{
Headless = true,
Args = new[]
{
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-dev-shm-usage",
"--disable-accelerated-2d-canvas",
"--disable-gpu",
"--window-size=1920x1080"
}
};
var browser = await Puppeteer.LaunchAsync(launchOptions);
For Docker deployments, ensure your Dockerfile includes necessary dependencies:
# Install Chrome dependencies
RUN apt-get update && apt-get install -y \
fonts-liberation \
libasound2 \
libatk-bridge2.0-0 \
libdrm2 \
libgtk-3-0 \
libnspr4 \
libnss3 \
libx11-xcb1 \
libxcomposite1 \
libxdamage1 \
libxrandr2 \
xdg-utils \
libxss1 \
libgconf-2-4
Common Use Cases
Invoice Generation
public async Task GenerateInvoicePdf(InvoiceData invoice, string outputPath)
{
var html = GenerateInvoiceHtml(invoice);
await page.SetContentAsync(html);
await page.PdfAsync(outputPath, new PdfOptions
{
Format = PaperFormat.A4,
PrintBackground = true,
DisplayHeaderFooter = true,
HeaderTemplate = "<div style='font-size:10px; text-align:center;'>Invoice #" + invoice.Number + "</div>",
FooterTemplate = "<div style='font-size:10px; text-align:center;'>Page <span class='pageNumber'></span></div>"
});
}
Report Generation from Dashboard
// Navigate to dashboard with authentication
await page.GoToAsync("https://dashboard.com/report?id=123");
// Wait for charts to render - similar to techniques used when you need to handle timeouts in Puppeteer
await page.WaitForSelectorAsync(".chart-container");
await page.WaitForTimeoutAsync(2000); // Additional wait for animations
await page.PdfAsync("dashboard-report.pdf", new PdfOptions
{
Format = PaperFormat.A3, // Larger format for dashboards
Landscape = true,
PrintBackground = true
});
Troubleshooting Common Issues
Memory Management
// Dispose pages properly to prevent memory leaks
await page.CloseAsync();
// Set resource limits
var launchOptions = new LaunchOptions
{
Args = new[] { "--max-old-space-size=4096", "--disable-dev-shm-usage" }
};
Handling Large Documents
// For large documents, consider splitting into chunks
var pdfOptions = new PdfOptions
{
Format = PaperFormat.A4,
PageRanges = "1-10", // Process in batches
PrintBackground = true
};
JavaScript Execution and Custom Fonts
You can execute JavaScript before PDF generation to ensure proper rendering:
// Execute JavaScript to wait for fonts or trigger animations
await page.EvaluateExpressionAsync(@"
// Wait for web fonts to load
await document.fonts.ready;
// Trigger any lazy-loaded content
window.scrollTo(0, document.body.scrollHeight);
// Wait for animations to complete
await new Promise(resolve => setTimeout(resolve, 1000));
");
await page.PdfAsync("styled-document.pdf", pdfOptions);
Conclusion
Puppeteer-Sharp provides a powerful and flexible solution for generating PDFs from web pages in .NET applications. Whether you're creating reports, invoices, or documentation, the library offers comprehensive customization options and robust performance. By following best practices for error handling, performance optimization, and server deployment, you can build reliable PDF generation services that scale with your application's needs.
The combination of Puppeteer-Sharp's PDF capabilities with its web scraping and automation features makes it an excellent choice for developers who need to generate high-quality PDF documents from dynamic web content. With proper configuration and optimization, you can create production-ready systems that handle thousands of PDF generations per day while maintaining quality and reliability.