Yes, lxml
provides excellent built-in support for pretty-printing both HTML and XML documents. The pretty_print
parameter in the tostring()
function automatically formats your output with proper indentation and line breaks.
Pretty-Printing XML
For XML documents, use etree.tostring()
with pretty_print=True
:
from lxml import etree
# Parse XML from string
xml_data = '''<root><child1>data1</child1><child2><subchild>nested</subchild></child2></root>'''
root = etree.fromstring(xml_data)
# Pretty-print the XML
pretty_xml = etree.tostring(root, pretty_print=True, encoding='unicode')
print(pretty_xml)
Output:
<root>
<child1>data1</child1>
<child2>
<subchild>nested</subchild>
</child2>
</root>
Pretty-Printing HTML
For HTML documents, use lxml.html
with the same approach:
from lxml import html
# Parse HTML
html_data = '''<html><head><title>Test</title></head><body><div><p>Hello World</p><span>Content</span></div></body></html>'''
root = html.fromstring(html_data)
# Pretty-print HTML
pretty_html = html.tostring(root, pretty_print=True, encoding='unicode', method='html')
print(pretty_html)
Working with Files
You can also pretty-print documents loaded from files:
from lxml import etree
# Parse from file
tree = etree.parse('document.xml')
# Pretty-print to file
with open('formatted_document.xml', 'wb') as f:
tree.write(f, pretty_print=True, encoding='utf-8', xml_declaration=True)
# Or print to console
print(etree.tostring(tree, pretty_print=True, encoding='unicode'))
Customizing Output Format
You can control various aspects of the pretty-printed output:
from lxml import etree
root = etree.fromstring('<root><item>test</item></root>')
# With XML declaration
formatted = etree.tostring(
root,
pretty_print=True,
encoding='utf-8',
xml_declaration=True
)
# Custom method for HTML
from lxml import html
html_root = html.fromstring('<div><p>content</p></div>')
formatted_html = html.tostring(
html_root,
pretty_print=True,
encoding='unicode',
method='html',
doctype='<!DOCTYPE html>'
)
Important Considerations
- HTML Auto-correction: When pretty-printing HTML with
lxml.html
, the library automatically corrects malformed tags and ensures valid HTML structure - Encoding: Use
encoding='unicode'
to get a string output, or specify'utf-8'
for bytes - Performance: Pretty-printing adds processing overhead, so avoid it in performance-critical applications
- Whitespace: Pretty-printing may add whitespace that could affect rendering in some contexts
The pretty-print functionality in lxml
is particularly useful for debugging, logging, or when you need human-readable output from your XML/HTML processing tasks.