How can I retrieve an element's attribute value using jsoup?

Jsoup is a Java library designed for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

To retrieve an element's attribute value using Jsoup, you need to follow these steps:

  1. Parse the HTML to create a Document object.
  2. Use selectors to find the element you're interested in.
  3. Call the attr method on the element to get the value of the desired attribute.

Here is a simple example in Java:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class JsoupExample {
    public static void main(String[] args) {
        String html = "<html><head><title>First parse</title></head>"
                    + "<body><p><a href='https://example.com'>example</a></p></body></html>";
        Document doc = Jsoup.parse(html);

        // Select the link element
        Element link = doc.select("a").first();

        // Get the value of the "href" attribute
        String linkHref = link.attr("href");

        System.out.println("Link Href: " + linkHref);
    }
}

In this example, we have a simple HTML string with an a (anchor) element. We parse the HTML into a Document, select the first a tag, and then retrieve the value of the href attribute.

If you are working with a remote HTML page, you can fetch it directly using Jsoup's connect method:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class JsoupFetchExample {
    public static void main(String[] args) {
        try {
            // Fetch the HTML content from a URL
            Document doc = Jsoup.connect("https://example.com").get();

            // Select the link element
            Element link = doc.select("a").first();

            // Get the value of the "href" attribute
            String linkHref = link.attr("href");

            System.out.println("Link Href: " + linkHref);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

In this example, we connected to "https://example.com" and fetched its content. Then we proceeded as before to extract the href attribute from the first link.

Remember to handle exceptions such as IOException when fetching content from a URL, as network operations are subject to failures.

When using Jsoup in a real project, you will need to include the library as a dependency. If you are using Maven, add the following dependency to your pom.xml file:

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.14.3</version>
</dependency>

Make sure to use the latest version of Jsoup by checking the official website or Maven Central.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon