Can jsoup parse a local HTML file?

Yes, jsoup can parse local HTML files just as easily as it can parse HTML content from a URL. Jsoup is a Java library designed for working with real-world HTML, and it provides a very convenient API for extracting and manipulating data using the best of DOM, CSS, and jquery-like methods.

Here is an example of how you can use jsoup to parse a local HTML file in Java:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.File;
import java.io.IOException;

public class JsoupLocalFileExample {
    public static void main(String[] args) {
        // Specify the path to the local HTML file
        File input = new File("/path/to/local/file.html");

        try {
            // Parse the HTML file using jsoup
            Document doc = Jsoup.parse(input, "UTF-8");

            // Use DOM methods to navigate the content
            Element content = doc.getElementById("content");
            Elements links = content.getElementsByTag("a");

            // Iterate over the extracted links and print their text
            for (Element link : links) {
                String linkText = link.text();
                System.out.println(linkText);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

In this code example:

  1. We import the necessary jsoup classes.
  2. We create a File object pointing to the local HTML file.
  3. We use Jsoup.parse() method, which takes the File object and the character set encoding as arguments, to parse the file into a Document object.
  4. We navigate the Document object using DOM methods, such as getElementById() and getElementsByTag().
  5. We iterate over the elements and print their text content.

Make sure to replace "/path/to/local/file.html" with the actual path to your local HTML file, and make sure you handle the IOException that might be thrown if there are issues reading the file.

Also, make sure to include jsoup in your project dependencies. If you are using Maven, you can add the following to your pom.xml file:

<dependencies>
    <dependency>
        <groupId>org.jsoup</groupId>
        <artifactId>jsoup</artifactId>
        <version>1.15.3</version> <!-- Use the latest version available -->
    </dependency>
</dependencies>

For Gradle, add this to your build.gradle file:

dependencies {
    implementation 'org.jsoup:jsoup:1.15.3' // Use the latest version available
}

Remember to replace 1.15.3 with the latest jsoup version available at the time you are adding the dependency.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon