Yes, lxml
provides a straightforward way to pretty-print HTML or XML. When parsing the document, you can use the pretty_print
option of functions like tostring
to format the output with indentation.
Here's how you can pretty-print XML with lxml
:
from lxml import etree
# Parse the XML
xml_data = '''<root><child1>data1</child1><child2>data2</child2></root>'''
root = etree.fromstring(xml_data)
# Pretty-print the XML
pretty_xml_string = etree.tostring(root, pretty_print=True, encoding='unicode')
print(pretty_xml_string)
And for HTML, you can use lxml.html
:
from lxml import html
# Parse the HTML
html_data = '''<html><head><title>Test</title></head><body><p>Hello World</p></body></html>'''
root = html.fromstring(html_data)
# Pretty-print the HTML
pretty_html_string = html.tostring(root, pretty_print=True, encoding='unicode', method='html')
print(pretty_html_string)
Keep in mind that pretty-printing HTML with lxml.html
will try to correct any malformed tags to ensure the output is valid HTML. This might result in a slightly different structure if your input HTML is not well-formed.
lxml
is a powerful library and provides much more control over the parsing and serialization of XML and HTML documents, but the above examples should cover the basic use case of pretty-printing.