How do I extract attributes from HTML elements using Kanna?

Kanna is a Swift library for parsing XML/HTML content. It allows you to extract attributes from HTML elements with ease using Swift. Below you'll find an example of how to use Kanna to parse HTML content and extract attributes from elements.

First, you need to ensure that Kanna is installed and imported into your Swift project. If you're using CocoaPods, you can add it to your Podfile:

pod 'Kanna', '~> 5.2.7'

And run pod install to integrate it into your project.

Here's an example of how to use Kanna to extract attributes from HTML elements:

import Kanna

// Your HTML content
let html = """
<html>
    <head>
        <title>Sample Page</title>
    </head>
    <body>
        <a href="https://www.example.com" class="link">Example Link</a>
        <img src="image.png" alt="Sample Image" />
    </body>
</html>
"""

do {
    // Parse the HTML document
    let doc = try HTML(html: html, encoding: .utf8)

    // Extract all 'a' tags
    for link in doc.xpath("//a") {
        // Get the 'href' attribute
        if let href = link["href"] {
            print("Link: \(href)")
        }

        // Get the 'class' attribute
        if let classAttr = link["class"] {
            print("Class: \(classAttr)")
        }
    }

    // Extract all 'img' tags
    for image in doc.xpath("//img") {
        // Get the 'src' attribute
        if let src = image["src"] {
            print("Image Source: \(src)")
        }

        // Get the 'alt' attribute
        if let alt = image["alt"] {
            print("Alt Text: \(alt)")
        }
    }

} catch let error {
    print("Error: \(error)")
}

In this example, we parse an HTML string and then use XPath queries to find specific elements. Kanna supports XPath 1.0 expressions, which allows you to navigate through elements and attributes in the HTML document.

We then loop through all the a tags to extract the href and class attributes, and all the img tags to extract their src and alt attributes.

When you run this script, it will output something like this:

Link: https://www.example.com
Class: link
Image Source: image.png
Alt Text: Sample Image

Make sure to handle any errors that might occur during parsing or when accessing the attributes. The error handling in the example above will catch and print any errors that occur during the parsing process.

Please note that web scraping can be subject to legal and ethical considerations. Always make sure to comply with the terms of service of the website you're scraping, and be respectful of the website's resources.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon