How do you save the manipulated DOM back to HTML in Cheerio?

Cheerio is a fast, flexible, and lean implementation of core jQuery designed specifically for the server in Node.js. When you use Cheerio to load and manipulate the DOM of an HTML document, you may reach a point where you want to serialize the manipulated DOM back into HTML.

To save or output the manipulated DOM as HTML in Cheerio, you utilize the .html() method. By calling .html() without any arguments on a Cheerio object, it will return the HTML contents of that object as a string. If you call it on the root object, you will get the entire HTML document.

Here's an example in Node.js:

const cheerio = require('cheerio');

// Load your HTML into Cheerio
const html = `<html>
<head><title>My Page</title></head>
<body>
  <h1>Welcome</h1>
  <p>This is a paragraph.</p>
</body>
</html>`;

const $ = cheerio.load(html);

// Manipulate the DOM
$('h1').text('Hello World');
$('p').addClass('intro');

// Save the manipulated DOM back to an HTML string
const updatedHtml = $.html();

// Output the updated HTML
console.log(updatedHtml);

Output of the above code would be:

<html>
<head><title>My Page</title></head>
<body>
  <h1>Hello World</h1>
  <p class="intro">This is a paragraph.</p>
</body>
</html>

If you want to get the HTML of a specific element only, you can call .html() on a Cheerio object containing that specific element:

// Assuming you have already loaded the HTML and manipulated it with Cheerio

// Get the outer HTML of the first paragraph
const paragraphHtml = $('p').first().html();

// Output the HTML of the paragraph
console.log(paragraphHtml); // This would output: 'This is a paragraph.'

In the case you want to serialize the entire document including the doctype, Cheerio does not have a built-in way to do that. However, you can prepend the doctype string to the output of $.html() if needed, but you will have to handle this manually based on the original document doctype.

Remember to install the cheerio package via npm before using it in your project:

npm install cheerio

Cheerio is a server-side library, and the above code examples are intended to be run in a Node.js environment. There is no direct equivalent to Cheerio in the browser, as browsers already provide the necessary APIs to manipulate the DOM (e.g., document.querySelector(), Element.innerHTML, etc.). However, if you are manipulating the DOM in a browser, you can similarly serialize an element to an HTML string using the outerHTML property:

// Select an element
const element = document.querySelector('h1');

// Manipulate it
element.textContent = 'Hello World';

// Serialize the element back to HTML
const elementHtml = element.outerHTML;

console.log(elementHtml); // Outputs: <h1>Hello World</h1>

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon