Jsoup is a Java library designed for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
To retrieve an element's attribute value using Jsoup, you need to follow these steps:
- Parse the HTML to create a
Document
object. - Use selectors to find the element you're interested in.
- Call the
attr
method on the element to get the value of the desired attribute.
Here is a simple example in Java:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class JsoupExample {
public static void main(String[] args) {
String html = "<html><head><title>First parse</title></head>"
+ "<body><p><a href='https://example.com'>example</a></p></body></html>";
Document doc = Jsoup.parse(html);
// Select the link element
Element link = doc.select("a").first();
// Get the value of the "href" attribute
String linkHref = link.attr("href");
System.out.println("Link Href: " + linkHref);
}
}
In this example, we have a simple HTML string with an a
(anchor) element. We parse the HTML into a Document
, select the first a
tag, and then retrieve the value of the href
attribute.
If you are working with a remote HTML page, you can fetch it directly using Jsoup's connect
method:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class JsoupFetchExample {
public static void main(String[] args) {
try {
// Fetch the HTML content from a URL
Document doc = Jsoup.connect("https://example.com").get();
// Select the link element
Element link = doc.select("a").first();
// Get the value of the "href" attribute
String linkHref = link.attr("href");
System.out.println("Link Href: " + linkHref);
} catch (IOException e) {
e.printStackTrace();
}
}
}
In this example, we connected to "https://example.com" and fetched its content. Then we proceeded as before to extract the href
attribute from the first link.
Remember to handle exceptions such as IOException
when fetching content from a URL, as network operations are subject to failures.
When using Jsoup in a real project, you will need to include the library as a dependency. If you are using Maven, add the following dependency to your pom.xml
file:
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.14.3</version>
</dependency>
Make sure to use the latest version of Jsoup by checking the official website or Maven Central.