Table of contents

Can I use Puppeteer-Sharp to generate PDFs from web pages?

Yes, Puppeteer-Sharp provides excellent support for generating PDFs from web pages. As the .NET port of the popular Puppeteer library, Puppeteer-Sharp includes robust PDF generation capabilities that allow you to convert any web page into a high-quality PDF document programmatically.

Getting Started with PDF Generation

First, ensure you have Puppeteer-Sharp installed in your .NET project:

dotnet add package PuppeteerSharp

Here's a basic example of generating a PDF from a web page:

using PuppeteerSharp;
using System;
using System.Threading.Tasks;

class Program
{
    static async Task Main(string[] args)
    {
        // Download Chromium if not already present
        await new BrowserFetcher().DownloadAsync();

        // Launch browser
        var browser = await Puppeteer.LaunchAsync(new LaunchOptions
        {
            Headless = true
        });

        // Create new page
        var page = await browser.NewPageAsync();

        // Navigate to the target URL
        await page.GoToAsync("https://example.com");

        // Generate PDF
        await page.PdfAsync("example.pdf");

        // Clean up
        await browser.CloseAsync();

        Console.WriteLine("PDF generated successfully!");
    }
}

Advanced PDF Configuration Options

Puppeteer-Sharp offers extensive customization options through the PdfOptions class:

var pdfOptions = new PdfOptions
{
    // Page format (A4, Letter, Legal, etc.)
    Format = PaperFormat.A4,

    // Custom page dimensions (overrides format)
    Width = "8.5in",
    Height = "11in",

    // Margins
    MarginOptions = new MarginOptions
    {
        Top = "1in",
        Bottom = "1in",
        Left = "0.5in",
        Right = "0.5in"
    },

    // Print background graphics
    PrintBackground = true,

    // Landscape orientation
    Landscape = false,

    // Scale factor (0.1 to 2.0)
    Scale = 1.0m,

    // Page ranges to print
    PageRanges = "1-3,5",

    // Display header and footer
    DisplayHeaderFooter = true,
    HeaderTemplate = "<div style='font-size:10px; text-align:center; width:100%;'>Header Content</div>",
    FooterTemplate = "<div style='font-size:10px; text-align:center; width:100%;'>Page <span class='pageNumber'></span> of <span class='totalPages'></span></div>",

    // Prefer CSS page size
    PreferCSSPageSize = true
};

await page.PdfAsync("custom-pdf.pdf", pdfOptions);

Handling Dynamic Content

When generating PDFs from dynamic web pages, you may need to wait for content to load fully. Similar to how you handle AJAX requests using Puppeteer, you can use various waiting strategies:

// Wait for specific element to appear
await page.WaitForSelectorAsync("#content-loaded");

// Wait for network to be idle
await page.GoToAsync("https://dynamic-site.com", 
    new NavigationOptions 
    { 
        WaitUntil = new[] { WaitUntilNavigation.Networkidle0 } 
    });

// Wait for specific timeout
await page.WaitForTimeoutAsync(2000);

// Generate PDF after content is ready
await page.PdfAsync("dynamic-content.pdf", pdfOptions);

Creating PDFs from HTML Strings

You can also generate PDFs from HTML content directly:

string htmlContent = @"
<!DOCTYPE html>
<html>
<head>
    <title>Generated Document</title>
    <style>
        body { font-family: Arial, sans-serif; margin: 40px; }
        h1 { color: #333; }
        .highlight { background-color: #ffff99; }
    </style>
</head>
<body>
    <h1>Report Title</h1>
    <p>This is a <span class='highlight'>dynamically generated</span> PDF document.</p>
    <table border='1' style='width:100%; border-collapse: collapse;'>
        <tr><th>Column 1</th><th>Column 2</th></tr>
        <tr><td>Data 1</td><td>Data 2</td></tr>
    </table>
</body>
</html>";

await page.SetContentAsync(htmlContent);
await page.PdfAsync("from-html.pdf", pdfOptions);

Handling Authentication and Cookies

For protected pages, you can set authentication headers or cookies before generating the PDF:

// Set authentication header
await page.SetExtraHttpHeadersAsync(new Dictionary<string, string>
{
    {"Authorization", "Bearer your-token-here"}
});

// Set cookies
await page.SetCookieAsync(new CookieParam
{
    Name = "session_id",
    Value = "your-session-value",
    Domain = "example.com"
});

// Navigate and generate PDF
await page.GoToAsync("https://protected-site.com/report");
await page.PdfAsync("protected-content.pdf");

Viewport and Responsive Design Considerations

Similar to how you can set viewport in Puppeteer, controlling the viewport is crucial for consistent PDF generation:

// Set viewport before navigation
await page.SetViewportAsync(new ViewPortOptions
{
    Width = 1200,
    Height = 800,
    DeviceScaleFactor = 1
});

// Emulate specific device
await page.EmulateAsync(DeviceDescriptors.Get("iPad Pro"));

await page.GoToAsync("https://responsive-site.com");
await page.PdfAsync("responsive-design.pdf");

Error Handling and Best Practices

Implement proper error handling for robust PDF generation:

public async Task<bool> GeneratePdfAsync(string url, string outputPath)
{
    Browser browser = null;
    try
    {
        await new BrowserFetcher().DownloadAsync();

        browser = await Puppeteer.LaunchAsync(new LaunchOptions
        {
            Headless = true,
            Args = new[] { "--no-sandbox", "--disable-setuid-sandbox" } // For server environments
        });

        var page = await browser.NewPageAsync();

        // Set longer timeout for complex pages
        page.DefaultTimeout = 30000;

        var response = await page.GoToAsync(url, new NavigationOptions
        {
            WaitUntil = new[] { WaitUntilNavigation.Networkidle2 },
            Timeout = 30000
        });

        if (!response.Ok)
        {
            throw new Exception($"Failed to load page: {response.Status}");
        }

        await page.PdfAsync(outputPath, new PdfOptions
        {
            Format = PaperFormat.A4,
            PrintBackground = true,
            MarginOptions = new MarginOptions
            {
                Top = "1cm",
                Bottom = "1cm",
                Left = "1cm",
                Right = "1cm"
            }
        });

        return true;
    }
    catch (Exception ex)
    {
        Console.WriteLine($"PDF generation failed: {ex.Message}");
        return false;
    }
    finally
    {
        if (browser != null)
        {
            await browser.CloseAsync();
        }
    }
}

Performance Optimization

For high-volume PDF generation, consider these optimization strategies:

public class PdfService : IDisposable
{
    private Browser _browser;
    private readonly SemaphoreSlim _semaphore;

    public PdfService(int maxConcurrency = 5)
    {
        _semaphore = new SemaphoreSlim(maxConcurrency, maxConcurrency);
    }

    public async Task InitializeAsync()
    {
        await new BrowserFetcher().DownloadAsync();
        _browser = await Puppeteer.LaunchAsync(new LaunchOptions
        {
            Headless = true,
            Args = new[] 
            { 
                "--no-sandbox", 
                "--disable-setuid-sandbox",
                "--disable-dev-shm-usage" // Prevent memory issues
            }
        });
    }

    public async Task<byte[]> GeneratePdfBytesAsync(string url)
    {
        await _semaphore.WaitAsync();
        try
        {
            var page = await _browser.NewPageAsync();
            try
            {
                await page.GoToAsync(url);
                var pdfBytes = await page.PdfDataAsync(new PdfOptions
                {
                    Format = PaperFormat.A4,
                    PrintBackground = true
                });
                return pdfBytes;
            }
            finally
            {
                await page.CloseAsync();
            }
        }
        finally
        {
            _semaphore.Release();
        }
    }

    public void Dispose()
    {
        _browser?.CloseAsync().Wait();
        _semaphore?.Dispose();
    }
}

Server Environment Considerations

When deploying PDF generation in server environments, especially Docker containers, you may need additional configuration:

var launchOptions = new LaunchOptions
{
    Headless = true,
    Args = new[]
    {
        "--no-sandbox",
        "--disable-setuid-sandbox",
        "--disable-dev-shm-usage",
        "--disable-accelerated-2d-canvas",
        "--disable-gpu",
        "--window-size=1920x1080"
    }
};

var browser = await Puppeteer.LaunchAsync(launchOptions);

For Docker deployments, ensure your Dockerfile includes necessary dependencies:

# Install Chrome dependencies
RUN apt-get update && apt-get install -y \
    fonts-liberation \
    libasound2 \
    libatk-bridge2.0-0 \
    libdrm2 \
    libgtk-3-0 \
    libnspr4 \
    libnss3 \
    libx11-xcb1 \
    libxcomposite1 \
    libxdamage1 \
    libxrandr2 \
    xdg-utils \
    libxss1 \
    libgconf-2-4

Common Use Cases

Invoice Generation

public async Task GenerateInvoicePdf(InvoiceData invoice, string outputPath)
{
    var html = GenerateInvoiceHtml(invoice);
    await page.SetContentAsync(html);
    await page.PdfAsync(outputPath, new PdfOptions
    {
        Format = PaperFormat.A4,
        PrintBackground = true,
        DisplayHeaderFooter = true,
        HeaderTemplate = "<div style='font-size:10px; text-align:center;'>Invoice #" + invoice.Number + "</div>",
        FooterTemplate = "<div style='font-size:10px; text-align:center;'>Page <span class='pageNumber'></span></div>"
    });
}

Report Generation from Dashboard

// Navigate to dashboard with authentication
await page.GoToAsync("https://dashboard.com/report?id=123");

// Wait for charts to render - similar to techniques used when you need to handle timeouts in Puppeteer
await page.WaitForSelectorAsync(".chart-container");
await page.WaitForTimeoutAsync(2000); // Additional wait for animations

await page.PdfAsync("dashboard-report.pdf", new PdfOptions
{
    Format = PaperFormat.A3, // Larger format for dashboards
    Landscape = true,
    PrintBackground = true
});

Troubleshooting Common Issues

Memory Management

// Dispose pages properly to prevent memory leaks
await page.CloseAsync();

// Set resource limits
var launchOptions = new LaunchOptions
{
    Args = new[] { "--max-old-space-size=4096", "--disable-dev-shm-usage" }
};

Handling Large Documents

// For large documents, consider splitting into chunks
var pdfOptions = new PdfOptions
{
    Format = PaperFormat.A4,
    PageRanges = "1-10", // Process in batches
    PrintBackground = true
};

JavaScript Execution and Custom Fonts

You can execute JavaScript before PDF generation to ensure proper rendering:

// Execute JavaScript to wait for fonts or trigger animations
await page.EvaluateExpressionAsync(@"
    // Wait for web fonts to load
    await document.fonts.ready;

    // Trigger any lazy-loaded content
    window.scrollTo(0, document.body.scrollHeight);

    // Wait for animations to complete
    await new Promise(resolve => setTimeout(resolve, 1000));
");

await page.PdfAsync("styled-document.pdf", pdfOptions);

Conclusion

Puppeteer-Sharp provides a powerful and flexible solution for generating PDFs from web pages in .NET applications. Whether you're creating reports, invoices, or documentation, the library offers comprehensive customization options and robust performance. By following best practices for error handling, performance optimization, and server deployment, you can build reliable PDF generation services that scale with your application's needs.

The combination of Puppeteer-Sharp's PDF capabilities with its web scraping and automation features makes it an excellent choice for developers who need to generate high-quality PDF documents from dynamic web content. With proper configuration and optimization, you can create production-ready systems that handle thousands of PDF generations per day while maintaining quality and reliability.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon