Yes, jsoup can parse local HTML files just as easily as it can parse HTML content from a URL. Jsoup is a Java library designed for working with real-world HTML, and it provides a very convenient API for extracting and manipulating data using the best of DOM, CSS, and jquery-like methods.
Here is an example of how you can use jsoup to parse a local HTML file in Java:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.File;
import java.io.IOException;
public class JsoupLocalFileExample {
public static void main(String[] args) {
// Specify the path to the local HTML file
File input = new File("/path/to/local/file.html");
try {
// Parse the HTML file using jsoup
Document doc = Jsoup.parse(input, "UTF-8");
// Use DOM methods to navigate the content
Element content = doc.getElementById("content");
Elements links = content.getElementsByTag("a");
// Iterate over the extracted links and print their text
for (Element link : links) {
String linkText = link.text();
System.out.println(linkText);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
In this code example:
- We import the necessary jsoup classes.
- We create a
File
object pointing to the local HTML file. - We use
Jsoup.parse()
method, which takes theFile
object and the character set encoding as arguments, to parse the file into aDocument
object. - We navigate the
Document
object using DOM methods, such asgetElementById()
andgetElementsByTag()
. - We iterate over the elements and print their text content.
Make sure to replace "/path/to/local/file.html"
with the actual path to your local HTML file, and make sure you handle the IOException
that might be thrown if there are issues reading the file.
Also, make sure to include jsoup in your project dependencies. If you are using Maven, you can add the following to your pom.xml
file:
<dependencies>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.15.3</version> <!-- Use the latest version available -->
</dependency>
</dependencies>
For Gradle, add this to your build.gradle
file:
dependencies {
implementation 'org.jsoup:jsoup:1.15.3' // Use the latest version available
}
Remember to replace 1.15.3
with the latest jsoup version available at the time you are adding the dependency.