Table of contents

How do I create new HTML elements with Nokogiri?

Nokogiri is a powerful Ruby gem for parsing and manipulating HTML and XML documents. One of its most useful features is the ability to create new HTML elements from scratch or add elements to existing documents. This guide covers various methods for creating HTML elements with Nokogiri, from simple element creation to building complex document structures.

Understanding Nokogiri's Element Creation Methods

Nokogiri provides several approaches to create new HTML elements:

  1. Using Nokogiri::HTML::Builder - The most intuitive way for building HTML structures
  2. Creating elements with new - Direct element instantiation
  3. Using create_element - Creating elements within existing documents
  4. Parsing HTML strings - Converting HTML strings into Nokogiri nodes

Method 1: Using Nokogiri::HTML::Builder

The Builder class is the most user-friendly way to create HTML structures. It provides a DSL (Domain Specific Language) that closely resembles HTML syntax.

Basic Builder Example

require 'nokogiri'

# Create a simple HTML structure
builder = Nokogiri::HTML::Builder.new do |doc|
  doc.html {
    doc.head {
      doc.title "My Web Page"
      doc.meta charset: "UTF-8"
    }
    doc.body {
      doc.h1 "Welcome to My Site", class: "header"
      doc.p "This is a paragraph created with Nokogiri.", id: "intro"
      doc.div(class: "container") {
        doc.ul {
          doc.li "First item"
          doc.li "Second item"
          doc.li "Third item"
        }
      }
    }
  }
end

# Convert to HTML string
html_output = builder.to_html
puts html_output

Advanced Builder Usage

# Building complex forms with attributes
builder = Nokogiri::HTML::Builder.new do |doc|
  doc.form(action: "/submit", method: "POST", class: "user-form") {
    doc.div(class: "form-group") {
      doc.label "Email:", for: "email"
      doc.input type: "email", id: "email", name: "email", required: true
    }
    doc.div(class: "form-group") {
      doc.label "Password:", for: "password"
      doc.input type: "password", id: "password", name: "password", required: true
    }
    doc.button "Submit", type: "submit", class: "btn btn-primary"
  }
end

puts builder.to_html

Method 2: Creating Elements with Document.new

You can create individual elements using the new method on document objects:

require 'nokogiri'

# Create a new HTML document
doc = Nokogiri::HTML::Document.new

# Create individual elements
div_element = Nokogiri::XML::Node.new("div", doc)
div_element['class'] = "container"
div_element['id'] = "main-content"

# Create and add child elements
h1_element = Nokogiri::XML::Node.new("h1", doc)
h1_element.content = "Dynamic Content"
div_element.add_child(h1_element)

# Create a paragraph with text
p_element = Nokogiri::XML::Node.new("p", doc)
p_element.content = "This paragraph was created dynamically."
div_element.add_child(p_element)

# Add to document body
doc.root = Nokogiri::XML::Node.new("html", doc)
body = Nokogiri::XML::Node.new("body", doc)
body.add_child(div_element)
doc.root.add_child(body)

puts doc.to_html

Method 3: Using create_element

The create_element method is useful when working with existing documents:

require 'nokogiri'

# Start with an existing HTML document
html = '<html><body><div id="content"></div></body></html>'
doc = Nokogiri::HTML(html)

# Find the target container
container = doc.at_css('#content')

# Create new elements
article = doc.create_element("article", class: "blog-post")
header = doc.create_element("header")
title = doc.create_element("h2", "My Blog Post Title")
content = doc.create_element("p", "This is the blog post content.")

# Build the structure
header.add_child(title)
article.add_child(header)
article.add_child(content)

# Add to the existing document
container.add_child(article)

puts doc.to_html

Method 4: Parsing HTML Strings

For simple cases, you can create elements by parsing HTML strings:

require 'nokogiri'

# Create elements from HTML strings
html_fragment = '<div class="widget"><h3>Widget Title</h3><p>Widget content goes here.</p></div>'
fragment = Nokogiri::HTML::DocumentFragment.parse(html_fragment)

# Add to existing document
existing_html = '<html><body><div id="sidebar"></div></body></html>'
doc = Nokogiri::HTML(existing_html)
sidebar = doc.at_css('#sidebar')
sidebar.add_child(fragment)

puts doc.to_html

Working with Attributes

Setting Single Attributes

require 'nokogiri'

doc = Nokogiri::HTML::Document.new
element = Nokogiri::XML::Node.new("div", doc)

# Multiple ways to set attributes
element['class'] = "my-class"
element.set_attribute('id', 'my-id')
element['data-value'] = "123"

puts element.to_html
# Output: <div class="my-class" id="my-id" data-value="123"></div>

Setting Multiple Attributes

# Using a hash for multiple attributes
attributes = {
  'class' => 'card border-primary',
  'id' => 'user-card',
  'data-user-id' => '12345',
  'role' => 'article'
}

element = Nokogiri::XML::Node.new("div", doc)
attributes.each { |key, value| element[key] = value }

Creating Complex Document Structures

Building a Complete HTML Page

require 'nokogiri'

builder = Nokogiri::HTML::Builder.new do |doc|
  doc.html(lang: "en") {
    doc.head {
      doc.meta charset: "UTF-8"
      doc.meta name: "viewport", content: "width=device-width, initial-scale=1.0"
      doc.title "Product Catalog"
      doc.link rel: "stylesheet", href: "/styles.css"
    }
    doc.body {
      doc.header(class: "site-header") {
        doc.nav(class: "navbar") {
          doc.a "Home", href: "/", class: "nav-link"
          doc.a "Products", href: "/products", class: "nav-link"
          doc.a "Contact", href: "/contact", class: "nav-link"
        }
      }

      doc.main(class: "content") {
        doc.section(class: "products") {
          doc.h1 "Our Products"

          # Create product cards dynamically
          products = [
            { name: "Laptop", price: "$999", image: "/laptop.jpg" },
            { name: "Phone", price: "$599", image: "/phone.jpg" },
            { name: "Tablet", price: "$399", image: "/tablet.jpg" }
          ]

          products.each do |product|
            doc.div(class: "product-card") {
              doc.img src: product[:image], alt: product[:name]
              doc.h3 product[:name]
              doc.p product[:price], class: "price"
              doc.button "Add to Cart", class: "btn btn-primary"
            }
          end
        }
      }

      doc.footer(class: "site-footer") {
        doc.p "© 2024 My Company. All rights reserved."
      }
    }
  }
end

puts builder.to_html

Integration with Web Scraping

When building web scrapers, you often need to combine existing content with new elements. Here's how Nokogiri element creation works with scraped data:

require 'nokogiri'
require 'open-uri'

# Simulate scraping content (in practice, you'd use a web scraping service)
existing_html = '''
<html>
  <body>
    <article class="post">
      <h1>Original Article Title</h1>
      <p>This content was scraped from a website...</p>
    </article>
  </body>
</html>
'''

doc = Nokogiri::HTML(existing_html)

# Find the target element
article = doc.at_css('.post')

# Create and add metadata elements
metadata_div = doc.create_element("div", class: "post-metadata")

# Add scraped metadata
date_span = doc.create_element("span", "Published: January 1, 2024")
date_span['class'] = "publish-date"
metadata_div.add_child(date_span)

# Add author information
author_span = doc.create_element("span", "By: John Doe")
author_span['class'] = "author"
metadata_div.add_child(author_span)

# Add social sharing buttons
social_div = doc.create_element("div", class: "social-share")
['Twitter', 'Facebook', 'LinkedIn'].each do |platform|
  button = doc.create_element("button", "Share on #{platform}")
  button['class'] = "social-btn #{platform.downcase}"
  button['data-platform'] = platform.downcase
  social_div.add_child(button)
end

# Insert elements into the document
title = article.at_css('h1')
title.add_next_sibling(metadata_div)
article.add_child(social_div)

puts doc.to_html

JavaScript Code Generation

You can also use Nokogiri to generate HTML that includes JavaScript for dynamic behavior:

require 'nokogiri'

builder = Nokogiri::HTML::Builder.new do |doc|
  doc.html {
    doc.head {
      doc.title "Interactive Form"
      doc.script <<~JS
        function validateForm() {
          const email = document.getElementById('email').value;
          const password = document.getElementById('password').value;

          if (!email || !password) {
            alert('Please fill in all fields');
            return false;
          }
          return true;
        }
      JS
    }
    doc.body {
      doc.form(onsubmit: "return validateForm()", action: "/submit") {
        doc.input type: "email", id: "email", placeholder: "Enter email", required: true
        doc.input type: "password", id: "password", placeholder: "Enter password", required: true
        doc.button "Submit", type: "submit"
      }
    }
  }
end

puts builder.to_html

Best Practices and Tips

1. Use Appropriate Methods for Your Use Case

  • Builder: Best for creating complete HTML structures from scratch
  • create_element: Ideal when adding elements to existing documents
  • HTML parsing: Good for simple elements or when you have HTML strings

2. Handle Special Characters Properly

# Nokogiri automatically escapes HTML entities
element = doc.create_element("p", "Text with <special> & characters")
puts element.to_html
# Output: <p>Text with &lt;special&gt; &amp; characters</p>

3. Use CSS Classes for Styling

# Good practice: use meaningful CSS classes
card = doc.create_element("div", class: "product-card featured")
card['data-product-id'] = "123"

4. Validate Your HTML Structure

# Check if elements are properly nested
doc = Nokogiri::HTML(your_html)
errors = doc.errors
if errors.any?
  puts "HTML validation errors:"
  errors.each { |error| puts "- #{error}" }
end

Performance Considerations

When creating many elements, consider these optimization strategies:

# Efficient way to create multiple similar elements
require 'nokogiri'

doc = Nokogiri::HTML::Document.new
container = doc.create_element("div", class: "container")

# Batch creation for better performance
items = (1..1000).map do |i|
  item = doc.create_element("div", "Item #{i}")
  item['class'] = "item"
  item['data-index'] = i.to_s
  item
end

# Add all items at once
items.each { |item| container.add_child(item) }

Common Use Cases

Data Export and Reporting

# Generate HTML reports from data
data = [
  { name: "John Doe", sales: 15000, region: "North" },
  { name: "Jane Smith", sales: 22000, region: "South" },
  { name: "Bob Johnson", sales: 18000, region: "East" }
]

builder = Nokogiri::HTML::Builder.new do |doc|
  doc.html {
    doc.head {
      doc.title "Sales Report"
      doc.style <<~CSS
        table { border-collapse: collapse; width: 100%; }
        th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
        th { background-color: #f2f2f2; }
      CSS
    }
    doc.body {
      doc.h1 "Quarterly Sales Report"
      doc.table {
        doc.thead {
          doc.tr {
            doc.th "Name"
            doc.th "Sales"
            doc.th "Region"
          }
        }
        doc.tbody {
          data.each do |row|
            doc.tr {
              doc.td row[:name]
              doc.td "$#{row[:sales].to_s.reverse.gsub(/(\d{3})(?=\d)/, '\\1,').reverse}"
              doc.td row[:region]
            }
          end
        }
      }
    }
  }
end

puts builder.to_html

Command Line Usage

For testing and debugging your Nokogiri element creation, you can use these command line techniques:

# Create a simple Ruby script for testing
cat > test_nokogiri.rb << 'EOF'
require 'nokogiri'

builder = Nokogiri::HTML::Builder.new do |doc|
  doc.html {
    doc.body {
      doc.h1 "Test Page"
      doc.p "Generated with Nokogiri"
    }
  }
end

puts builder.to_html
EOF

# Run the script
ruby test_nokogiri.rb

# Save output to file
ruby test_nokogiri.rb > output.html

Conclusion

Creating HTML elements with Nokogiri provides developers with powerful tools for dynamic content generation and document manipulation. Whether you're building complete HTML documents from scratch, adding elements to scraped content, or creating complex nested structures, Nokogiri's various methods offer flexibility and control.

The Builder class is particularly useful for creating clean, readable code when building HTML structures, while methods like create_element excel when working with existing documents. Choose the approach that best fits your specific use case and always consider performance implications when creating large numbers of elements.

For web scraping applications that need to generate or modify HTML content, these techniques complement browser automation tools for handling dynamic content and provide a robust foundation for content manipulation and generation. When combined with proper error handling strategies, Nokogiri's element creation capabilities enable sophisticated data processing and HTML generation workflows.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon