Yes, you can access the HTML source of a webpage using Playwright. Playwright is a Node.js library to automate Chromium, Firefox, and WebKit browsers with a single API. It allows you to control headless (no GUI) or non-headless browsers and provides functionality for web scraping, simulating user interaction and much more.
Here is how you can do it in both JavaScript and Python:
JavaScript
In JavaScript, you can use the content()
function to get the HTML content of a page. Here's an example:
const playwright = require('playwright');
(async () => {
const browser = await playwright.chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
await page.goto('https://example.com');
const htmlContent = await page.content();
console.log(htmlContent);
await browser.close();
})();
In this script, we first initialize a new browser context, then create a new page, navigate to 'https://example.com' and then get the HTML content of the page using page.content()
.
Python
Similarly, in Python, you can use the content()
function to get the HTML content of a page. Here's an example:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
context = browser.new_context()
page = context.new_page()
page.goto('https://example.com')
html_content = page.content()
print(html_content)
browser.close()
In this script, we're doing the same thing as in the JavaScript version: initialize a new browser context, create a new page, navigate to 'https://example.com', and then get the HTML content of the page using page.content()
.
Remember that these examples will return the HTML content of the page after any JavaScript has been executed. This means that if the page uses JavaScript to load additional content, this content will be included in the returned HTML.