To save a modified HTML or XML tree back to a file using lxml
, you need to first make your changes to the tree and then write the tree back to a file. The lxml
library provides methods for both parsing and serializing XML and HTML documents.
Here's how you can do it:
For XML:
from lxml import etree
# Parse the XML file
tree = etree.parse('example.xml')
# Get the root element
root = tree.getroot()
# Make modifications to the tree
# For example, let's add a new element
new_element = etree.SubElement(root, "new_element")
new_element.text = "This is a new element"
# Save the modified tree back to a file
tree.write('modified_example.xml', pretty_print=True, xml_declaration=True, encoding="UTF-8")
For HTML:
from lxml import html
# Parse the HTML file
tree = html.parse('example.html')
# Get the root element
root = tree.getroot()
# Make modifications to the tree
# For example, let's add a new div
new_div = html.SubElement(root.body, "div")
new_div.text = "This is a new div"
# Save the modified tree back to a file
tree.write('modified_example.html', pretty_print=True, method="html", encoding="UTF-8")
When you use the write
method, you can specify several options:
- pretty_print=True
: This will format the output to be more human-readable.
- xml_declaration=True
: This will add an XML declaration at the top of the file (only applicable for XML files).
- encoding="UTF-8"
: This specifies the encoding of the output file.
- method="html"
: This is used when writing HTML to ensure that the output is serialized as HTML.
Please note that when modifying an HTML or XML tree, you should be aware of the structure and schema to ensure that your modifications are valid. Invalid modifications can result in a malformed document that may not be parsed correctly by browsers or other XML/HTML processors.