To update or replace content in a Beautiful Soup parse tree, you can directly assign new strings or tags to the existing elements. Beautiful Soup provides a simple and intuitive interface for manipulating the parse tree.
Here are some common ways to update or replace content:
Replacing Text
To replace text within an element, you can assign a new string to the .string
attribute of a tag. If the tag has only one string child, it will be replaced with the new string.
from bs4 import BeautifulSoup
html_doc = '<p id="my_paragraph">Old text</p>'
soup = BeautifulSoup(html_doc, 'html.parser')
# Find the paragraph tag
p_tag = soup.find('p', id='my_paragraph')
# Replace the text
p_tag.string = 'New text'
# Verify the change
print(soup.prettify())
Adding or Modifying Attributes
You can add or modify an attribute by treating the tag as a dictionary and assigning a new value to the desired attribute key.
from bs4 import BeautifulSoup
html_doc = '<p id="my_paragraph">Some text</p>'
soup = BeautifulSoup(html_doc, 'html.parser')
# Find the paragraph tag
p_tag = soup.find('p', id='my_paragraph')
# Modify the id attribute
p_tag['id'] = 'new_id'
# Add a new attribute, e.g., class
p_tag['class'] = 'new_class'
# Verify the change
print(soup.prettify())
Replacing Tags
You can replace an entire tag with a new one by using the .replace_with()
method.
from bs4 import BeautifulSoup
html_doc = '<p id="my_paragraph">Some text</p>'
soup = BeautifulSoup(html_doc, 'html.parser')
# Find the paragraph tag
p_tag = soup.find('p', id='my_paragraph')
# Create a new tag
new_tag = soup.new_tag('div', id='new_div')
new_tag.string = 'This is a div'
# Replace the old tag with the new tag
p_tag.replace_with(new_tag)
# Verify the change
print(soup.prettify())
Removing Attributes
To remove an attribute from a tag, use the del
keyword on the tag's attribute dictionary.
from bs4 import BeautifulSoup
html_doc = '<p id="my_paragraph" class="my_class">Some text</p>'
soup = BeautifulSoup(html_doc, 'html.parser')
# Find the paragraph tag
p_tag = soup.find('p', id='my_paragraph')
# Remove the class attribute
del p_tag['class']
# Verify the change
print(soup.prettify())
Removing Tags or Strings
To remove a tag or string from the parse tree, use the .decompose()
method for tags or .extract()
for both tags and strings.
from bs4 import BeautifulSoup
html_doc = '<div>Remove this <p id="my_paragraph">paragraph</p></div>'
soup = BeautifulSoup(html_doc, 'html.parser')
# Find the paragraph tag
p_tag = soup.find('p', id='my_paragraph')
# Remove the tag
p_tag.decompose()
# Verify the change
print(soup.prettify())
Beautiful Soup is a powerful library that makes it easy to navigate, search, and modify the parse tree. Always remember to convert the modified Beautiful Soup object back to a string or bytes if you need to save or display the updated HTML/XML.