HtmlUnit is a Java library used to simulate a web browser without the use of an actual browser GUI. The WebClient
class is the central class within HtmlUnit, providing an interface to use the capabilities of the library. It is used to create a virtual browser, make requests, and interact with web pages programmatically. Below are some common methods provided by the WebClient
class:
Navigation and Page Retrieval
getPage(String url)
: Loads a web page from the specified URL and returns aPage
object that represents the loaded page.getPage(URL url)
: Similar to the above but takes aURL
object instead of a string.getPage(WebRequest request)
: Loads a page based on aWebRequest
object that allows for more detailed configuration of the request.
Configuration and Settings
getOptions()
: Returns theWebClientOptions
object that holds WebClient's options/settings, allowing for the modification of settings like JavaScript and CSS support, timeouts, and proxy settings.getCookieManager()
: Returns theCookieManager
used by this WebClient which allows for manipulation of cookies.getCache()
: Returns the cache used by this web client.getJavaScriptEngine()
: Returns the JavaScript engine used by this WebClient.
JavaScript and Ajax
waitForBackgroundJavaScript(long timeoutMillis)
: Waits for JavaScript to execute in the background up to a specified timeout, which is useful for pages that have AJAX calls that complete after the initial page load.isJavaScriptEnabled()
: Checks whether JavaScript execution is enabled.setJavaScriptEnabled(boolean enabled)
: Enables or disables JavaScript execution.
Event Listeners and Handlers
setAlertHandler(AlertHandler alertHandler)
: Sets the handler that will handle JavaScript alert() calls.setConfirmHandler(ConfirmHandler confirmHandler)
: Sets the handler that will handle JavaScript confirm() calls.setPromptHandler(PromptHandler promptHandler)
: Sets the handler that will handle JavaScript prompt() calls.
Headers and Responses
addRequestHeader(String name, String value)
: Adds a request header that will be sent with all future requests.removeRequestHeader(String name)
: Removes a previously added request header.getCurrentWindow()
: Returns theWebWindow
that represents the current window or frame.
Miscellaneous
close()
: Closes the WebClient and all associated windows, which is important to free resources.getWebConnection()
: Returns theWebConnection
object that is used to send requests to the server.
Here's an example of how you might use the WebClient
class to navigate to a web page and print its title in Java:
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class WebClientExample {
public static void main(String[] args) {
// Create a new instance of WebClient
try (final WebClient webClient = new WebClient()) {
// Navigate to a web page and get the Page object
HtmlPage page = webClient.getPage("http://example.com");
// Print the title of the page
System.out.println("Page title: " + page.getTitleText());
} catch (Exception e) {
e.printStackTrace();
}
}
}
Make sure to handle exceptions and close the WebClient
properly to avoid leaking resources. The try-with-resources
statement in the example above ensures that the WebClient
is closed automatically.