Yes, you can easily access the HTML source of a webpage using Playwright's page.content()
method. This method returns the complete HTML source after JavaScript execution, making it ideal for scraping dynamic content.
JavaScript
Use the page.content()
method to retrieve the full HTML source:
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
const htmlContent = await page.content();
console.log(htmlContent);
await browser.close();
})();
With Error Handling
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch();
const page = await browser.newPage();
try {
await page.goto('https://example.com', { waitUntil: 'networkidle' });
const htmlContent = await page.content();
console.log(`HTML length: ${htmlContent.length} characters`);
console.log(htmlContent);
} catch (error) {
console.error('Error fetching HTML:', error);
} finally {
await browser.close();
}
})();
Python
Synchronous API
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto('https://example.com')
html_content = page.content()
print(html_content)
browser.close()
Asynchronous API
import asyncio
from playwright.async_api import async_playwright
async def get_html():
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
await page.goto('https://example.com')
html_content = await page.content()
print(html_content)
await browser.close()
asyncio.run(get_html())
Advanced Usage
Getting HTML After Specific Interactions
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// Click a button that loads content dynamically
await page.click('#load-more-button');
// Wait for new content to load
await page.waitForSelector('.dynamic-content');
// Get HTML after interaction
const htmlContent = await page.content();
console.log(htmlContent);
await browser.close();
})();
Getting HTML of Specific Elements
// Get innerHTML of a specific element
const elementHTML = await page.innerHTML('#content-div');
// Get outerHTML of a specific element
const outerHTML = await page.locator('#content-div').innerHTML();
Important Notes
- JavaScript Execution:
page.content()
returns HTML after JavaScript has been executed, including dynamically loaded content - Timing: The method captures the DOM state at the moment it's called
- Complete Source: Returns the full document HTML, including
<html>
,<head>
, and<body>
tags - Network Activity: Consider using
waitUntil: 'networkidle'
for pages with ongoing network activity
Common Use Cases
- Web Scraping: Extract data from JavaScript-heavy websites
- Testing: Verify HTML structure after user interactions
- Content Analysis: Analyze fully rendered page content
- SEO Auditing: Check final HTML output for search engines