How do I create new HTML elements with Nokogiri?
Nokogiri is a powerful Ruby gem for parsing and manipulating HTML and XML documents. One of its most useful features is the ability to create new HTML elements from scratch or add elements to existing documents. This guide covers various methods for creating HTML elements with Nokogiri, from simple element creation to building complex document structures.
Understanding Nokogiri's Element Creation Methods
Nokogiri provides several approaches to create new HTML elements:
- Using
Nokogiri::HTML::Builder
- The most intuitive way for building HTML structures - Creating elements with
new
- Direct element instantiation - Using
create_element
- Creating elements within existing documents - Parsing HTML strings - Converting HTML strings into Nokogiri nodes
Method 1: Using Nokogiri::HTML::Builder
The Builder
class is the most user-friendly way to create HTML structures. It provides a DSL (Domain Specific Language) that closely resembles HTML syntax.
Basic Builder Example
require 'nokogiri'
# Create a simple HTML structure
builder = Nokogiri::HTML::Builder.new do |doc|
doc.html {
doc.head {
doc.title "My Web Page"
doc.meta charset: "UTF-8"
}
doc.body {
doc.h1 "Welcome to My Site", class: "header"
doc.p "This is a paragraph created with Nokogiri.", id: "intro"
doc.div(class: "container") {
doc.ul {
doc.li "First item"
doc.li "Second item"
doc.li "Third item"
}
}
}
}
end
# Convert to HTML string
html_output = builder.to_html
puts html_output
Advanced Builder Usage
# Building complex forms with attributes
builder = Nokogiri::HTML::Builder.new do |doc|
doc.form(action: "/submit", method: "POST", class: "user-form") {
doc.div(class: "form-group") {
doc.label "Email:", for: "email"
doc.input type: "email", id: "email", name: "email", required: true
}
doc.div(class: "form-group") {
doc.label "Password:", for: "password"
doc.input type: "password", id: "password", name: "password", required: true
}
doc.button "Submit", type: "submit", class: "btn btn-primary"
}
end
puts builder.to_html
Method 2: Creating Elements with Document.new
You can create individual elements using the new
method on document objects:
require 'nokogiri'
# Create a new HTML document
doc = Nokogiri::HTML::Document.new
# Create individual elements
div_element = Nokogiri::XML::Node.new("div", doc)
div_element['class'] = "container"
div_element['id'] = "main-content"
# Create and add child elements
h1_element = Nokogiri::XML::Node.new("h1", doc)
h1_element.content = "Dynamic Content"
div_element.add_child(h1_element)
# Create a paragraph with text
p_element = Nokogiri::XML::Node.new("p", doc)
p_element.content = "This paragraph was created dynamically."
div_element.add_child(p_element)
# Add to document body
doc.root = Nokogiri::XML::Node.new("html", doc)
body = Nokogiri::XML::Node.new("body", doc)
body.add_child(div_element)
doc.root.add_child(body)
puts doc.to_html
Method 3: Using create_element
The create_element
method is useful when working with existing documents:
require 'nokogiri'
# Start with an existing HTML document
html = '<html><body><div id="content"></div></body></html>'
doc = Nokogiri::HTML(html)
# Find the target container
container = doc.at_css('#content')
# Create new elements
article = doc.create_element("article", class: "blog-post")
header = doc.create_element("header")
title = doc.create_element("h2", "My Blog Post Title")
content = doc.create_element("p", "This is the blog post content.")
# Build the structure
header.add_child(title)
article.add_child(header)
article.add_child(content)
# Add to the existing document
container.add_child(article)
puts doc.to_html
Method 4: Parsing HTML Strings
For simple cases, you can create elements by parsing HTML strings:
require 'nokogiri'
# Create elements from HTML strings
html_fragment = '<div class="widget"><h3>Widget Title</h3><p>Widget content goes here.</p></div>'
fragment = Nokogiri::HTML::DocumentFragment.parse(html_fragment)
# Add to existing document
existing_html = '<html><body><div id="sidebar"></div></body></html>'
doc = Nokogiri::HTML(existing_html)
sidebar = doc.at_css('#sidebar')
sidebar.add_child(fragment)
puts doc.to_html
Working with Attributes
Setting Single Attributes
require 'nokogiri'
doc = Nokogiri::HTML::Document.new
element = Nokogiri::XML::Node.new("div", doc)
# Multiple ways to set attributes
element['class'] = "my-class"
element.set_attribute('id', 'my-id')
element['data-value'] = "123"
puts element.to_html
# Output: <div class="my-class" id="my-id" data-value="123"></div>
Setting Multiple Attributes
# Using a hash for multiple attributes
attributes = {
'class' => 'card border-primary',
'id' => 'user-card',
'data-user-id' => '12345',
'role' => 'article'
}
element = Nokogiri::XML::Node.new("div", doc)
attributes.each { |key, value| element[key] = value }
Creating Complex Document Structures
Building a Complete HTML Page
require 'nokogiri'
builder = Nokogiri::HTML::Builder.new do |doc|
doc.html(lang: "en") {
doc.head {
doc.meta charset: "UTF-8"
doc.meta name: "viewport", content: "width=device-width, initial-scale=1.0"
doc.title "Product Catalog"
doc.link rel: "stylesheet", href: "/styles.css"
}
doc.body {
doc.header(class: "site-header") {
doc.nav(class: "navbar") {
doc.a "Home", href: "/", class: "nav-link"
doc.a "Products", href: "/products", class: "nav-link"
doc.a "Contact", href: "/contact", class: "nav-link"
}
}
doc.main(class: "content") {
doc.section(class: "products") {
doc.h1 "Our Products"
# Create product cards dynamically
products = [
{ name: "Laptop", price: "$999", image: "/laptop.jpg" },
{ name: "Phone", price: "$599", image: "/phone.jpg" },
{ name: "Tablet", price: "$399", image: "/tablet.jpg" }
]
products.each do |product|
doc.div(class: "product-card") {
doc.img src: product[:image], alt: product[:name]
doc.h3 product[:name]
doc.p product[:price], class: "price"
doc.button "Add to Cart", class: "btn btn-primary"
}
end
}
}
doc.footer(class: "site-footer") {
doc.p "© 2024 My Company. All rights reserved."
}
}
}
end
puts builder.to_html
Integration with Web Scraping
When building web scrapers, you often need to combine existing content with new elements. Here's how Nokogiri element creation works with scraped data:
require 'nokogiri'
require 'open-uri'
# Simulate scraping content (in practice, you'd use a web scraping service)
existing_html = '''
<html>
<body>
<article class="post">
<h1>Original Article Title</h1>
<p>This content was scraped from a website...</p>
</article>
</body>
</html>
'''
doc = Nokogiri::HTML(existing_html)
# Find the target element
article = doc.at_css('.post')
# Create and add metadata elements
metadata_div = doc.create_element("div", class: "post-metadata")
# Add scraped metadata
date_span = doc.create_element("span", "Published: January 1, 2024")
date_span['class'] = "publish-date"
metadata_div.add_child(date_span)
# Add author information
author_span = doc.create_element("span", "By: John Doe")
author_span['class'] = "author"
metadata_div.add_child(author_span)
# Add social sharing buttons
social_div = doc.create_element("div", class: "social-share")
['Twitter', 'Facebook', 'LinkedIn'].each do |platform|
button = doc.create_element("button", "Share on #{platform}")
button['class'] = "social-btn #{platform.downcase}"
button['data-platform'] = platform.downcase
social_div.add_child(button)
end
# Insert elements into the document
title = article.at_css('h1')
title.add_next_sibling(metadata_div)
article.add_child(social_div)
puts doc.to_html
JavaScript Code Generation
You can also use Nokogiri to generate HTML that includes JavaScript for dynamic behavior:
require 'nokogiri'
builder = Nokogiri::HTML::Builder.new do |doc|
doc.html {
doc.head {
doc.title "Interactive Form"
doc.script <<~JS
function validateForm() {
const email = document.getElementById('email').value;
const password = document.getElementById('password').value;
if (!email || !password) {
alert('Please fill in all fields');
return false;
}
return true;
}
JS
}
doc.body {
doc.form(onsubmit: "return validateForm()", action: "/submit") {
doc.input type: "email", id: "email", placeholder: "Enter email", required: true
doc.input type: "password", id: "password", placeholder: "Enter password", required: true
doc.button "Submit", type: "submit"
}
}
}
end
puts builder.to_html
Best Practices and Tips
1. Use Appropriate Methods for Your Use Case
- Builder: Best for creating complete HTML structures from scratch
- create_element: Ideal when adding elements to existing documents
- HTML parsing: Good for simple elements or when you have HTML strings
2. Handle Special Characters Properly
# Nokogiri automatically escapes HTML entities
element = doc.create_element("p", "Text with <special> & characters")
puts element.to_html
# Output: <p>Text with <special> & characters</p>
3. Use CSS Classes for Styling
# Good practice: use meaningful CSS classes
card = doc.create_element("div", class: "product-card featured")
card['data-product-id'] = "123"
4. Validate Your HTML Structure
# Check if elements are properly nested
doc = Nokogiri::HTML(your_html)
errors = doc.errors
if errors.any?
puts "HTML validation errors:"
errors.each { |error| puts "- #{error}" }
end
Performance Considerations
When creating many elements, consider these optimization strategies:
# Efficient way to create multiple similar elements
require 'nokogiri'
doc = Nokogiri::HTML::Document.new
container = doc.create_element("div", class: "container")
# Batch creation for better performance
items = (1..1000).map do |i|
item = doc.create_element("div", "Item #{i}")
item['class'] = "item"
item['data-index'] = i.to_s
item
end
# Add all items at once
items.each { |item| container.add_child(item) }
Common Use Cases
Data Export and Reporting
# Generate HTML reports from data
data = [
{ name: "John Doe", sales: 15000, region: "North" },
{ name: "Jane Smith", sales: 22000, region: "South" },
{ name: "Bob Johnson", sales: 18000, region: "East" }
]
builder = Nokogiri::HTML::Builder.new do |doc|
doc.html {
doc.head {
doc.title "Sales Report"
doc.style <<~CSS
table { border-collapse: collapse; width: 100%; }
th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
th { background-color: #f2f2f2; }
CSS
}
doc.body {
doc.h1 "Quarterly Sales Report"
doc.table {
doc.thead {
doc.tr {
doc.th "Name"
doc.th "Sales"
doc.th "Region"
}
}
doc.tbody {
data.each do |row|
doc.tr {
doc.td row[:name]
doc.td "$#{row[:sales].to_s.reverse.gsub(/(\d{3})(?=\d)/, '\\1,').reverse}"
doc.td row[:region]
}
end
}
}
}
}
end
puts builder.to_html
Command Line Usage
For testing and debugging your Nokogiri element creation, you can use these command line techniques:
# Create a simple Ruby script for testing
cat > test_nokogiri.rb << 'EOF'
require 'nokogiri'
builder = Nokogiri::HTML::Builder.new do |doc|
doc.html {
doc.body {
doc.h1 "Test Page"
doc.p "Generated with Nokogiri"
}
}
end
puts builder.to_html
EOF
# Run the script
ruby test_nokogiri.rb
# Save output to file
ruby test_nokogiri.rb > output.html
Conclusion
Creating HTML elements with Nokogiri provides developers with powerful tools for dynamic content generation and document manipulation. Whether you're building complete HTML documents from scratch, adding elements to scraped content, or creating complex nested structures, Nokogiri's various methods offer flexibility and control.
The Builder class is particularly useful for creating clean, readable code when building HTML structures, while methods like create_element
excel when working with existing documents. Choose the approach that best fits your specific use case and always consider performance implications when creating large numbers of elements.
For web scraping applications that need to generate or modify HTML content, these techniques complement browser automation tools for handling dynamic content and provide a robust foundation for content manipulation and generation. When combined with proper error handling strategies, Nokogiri's element creation capabilities enable sophisticated data processing and HTML generation workflows.