Using a proxy with Jsoup is essential for bypassing IP restrictions, avoiding rate limiting, and maintaining anonymity during web scraping. Jsoup provides multiple ways to configure proxy settings for your HTTP requests.
Quick Start: Direct Connection Proxy
The simplest approach is to configure the proxy directly on the Jsoup connection:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
String url = "https://example.com";
Document doc = Jsoup.connect(url)
.proxy("proxy.example.com", 8080)
.userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36")
.timeout(10000)
.get();
System.out.println(doc.title());
Method 1: System Properties (Global Configuration)
Set proxy properties globally for all HTTP connections in your application:
// HTTP proxy configuration
System.setProperty("http.proxyHost", "proxy.example.com");
System.setProperty("http.proxyPort", "8080");
System.setProperty("https.proxyHost", "proxy.example.com");
System.setProperty("https.proxyPort", "8080");
// Optional: Bypass proxy for certain hosts
System.setProperty("http.nonProxyHosts", "localhost|127.*|[::1]");
// Use Jsoup normally - proxy will be used automatically
Document doc = Jsoup.connect("https://example.com").get();
Method 2: Per-Connection Configuration
Configure proxy settings for individual connections:
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.io.IOException;
public class ProxyExample {
public static void main(String[] args) {
try {
Connection connection = Jsoup.connect("https://httpbin.org/ip")
.proxy("proxy.example.com", 8080)
.userAgent("Mozilla/5.0 (compatible; JavaBot/1.0)")
.timeout(15000)
.ignoreContentType(true);
Document doc = connection.get();
System.out.println("Response: " + doc.text());
} catch (IOException e) {
System.err.println("Failed to connect through proxy: " + e.getMessage());
}
}
}
Proxy Authentication
Basic Authentication
For proxies requiring username and password authentication:
import java.net.Authenticator;
import java.net.PasswordAuthentication;
public class AuthenticatedProxyExample {
public static void setupProxyAuth(String username, String password) {
Authenticator.setDefault(new Authenticator() {
@Override
protected PasswordAuthentication getPasswordAuthentication() {
if (getRequestorType() == RequestorType.PROXY) {
return new PasswordAuthentication(username, password.toCharArray());
}
return null;
}
});
}
public static void main(String[] args) throws IOException {
// Setup authentication
setupProxyAuth("proxy_user", "proxy_password");
// Configure and use proxy
Document doc = Jsoup.connect("https://example.com")
.proxy("authenticated-proxy.example.com", 8080)
.get();
System.out.println(doc.title());
}
}
Alternative: System Properties Authentication
System.setProperty("http.proxyHost", "proxy.example.com");
System.setProperty("http.proxyPort", "8080");
System.setProperty("http.proxyUser", "your_username");
System.setProperty("http.proxyPassword", "your_password");
// For HTTPS
System.setProperty("https.proxyHost", "proxy.example.com");
System.setProperty("https.proxyPort", "8080");
System.setProperty("https.proxyUser", "your_username");
System.setProperty("https.proxyPassword", "your_password");
SOCKS Proxy Configuration
For SOCKS4/SOCKS5 proxies:
import java.net.InetSocketAddress;
import java.net.Proxy;
import java.net.Socket;
// Method 1: System properties
System.setProperty("socksProxyHost", "socks-proxy.example.com");
System.setProperty("socksProxyPort", "1080");
System.setProperty("socksProxyVersion", "5"); // or "4"
// Method 2: Using Java's Proxy class (for custom implementations)
public class SocksProxyExample {
public static Document connectViaSocks(String url, String proxyHost, int proxyPort)
throws IOException {
// Note: Jsoup doesn't directly support Proxy objects
// You'll need to use system properties for SOCKS
System.setProperty("socksProxyHost", proxyHost);
System.setProperty("socksProxyPort", String.valueOf(proxyPort));
return Jsoup.connect(url).get();
}
}
Error Handling and Retry Logic
Implement robust error handling when using proxies:
import org.jsoup.HttpStatusException;
import java.net.SocketTimeoutException;
import java.net.ConnectException;
public class RobustProxyClient {
private static final int MAX_RETRIES = 3;
private static final int TIMEOUT = 10000;
public static Document fetchWithRetry(String url, String proxyHost, int proxyPort) {
for (int attempt = 1; attempt <= MAX_RETRIES; attempt++) {
try {
return Jsoup.connect(url)
.proxy(proxyHost, proxyPort)
.timeout(TIMEOUT)
.userAgent("Mozilla/5.0 (compatible; JavaBot/1.0)")
.get();
} catch (HttpStatusException e) {
System.err.println("HTTP " + e.getStatusCode() + " on attempt " + attempt);
if (e.getStatusCode() == 407) {
throw new RuntimeException("Proxy authentication required", e);
}
} catch (SocketTimeoutException e) {
System.err.println("Timeout on attempt " + attempt);
} catch (ConnectException e) {
System.err.println("Connection failed on attempt " + attempt + ": " + e.getMessage());
} catch (IOException e) {
System.err.println("IO error on attempt " + attempt + ": " + e.getMessage());
}
if (attempt < MAX_RETRIES) {
try {
Thread.sleep(2000 * attempt); // Exponential backoff
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
break;
}
}
}
throw new RuntimeException("Failed to fetch after " + MAX_RETRIES + " attempts");
}
}
Testing Proxy Configuration
Verify your proxy setup with a simple test:
public class ProxyTester {
public static void testProxy(String proxyHost, int proxyPort) {
try {
// Test without proxy
Document directDoc = Jsoup.connect("https://httpbin.org/ip").get();
System.out.println("Direct IP: " + directDoc.text());
// Test with proxy
Document proxyDoc = Jsoup.connect("https://httpbin.org/ip")
.proxy(proxyHost, proxyPort)
.get();
System.out.println("Proxy IP: " + proxyDoc.text());
} catch (IOException e) {
System.err.println("Proxy test failed: " + e.getMessage());
}
}
}
Multiple Proxy Support
Rotate between multiple proxies for better reliability:
import java.util.Arrays;
import java.util.List;
import java.util.Random;
public class ProxyRotator {
private final List<ProxyInfo> proxies;
private final Random random = new Random();
public ProxyRotator(List<ProxyInfo> proxies) {
this.proxies = proxies;
}
public Document fetch(String url) throws IOException {
ProxyInfo proxy = proxies.get(random.nextInt(proxies.size()));
return Jsoup.connect(url)
.proxy(proxy.host, proxy.port)
.timeout(10000)
.get();
}
static class ProxyInfo {
final String host;
final int port;
ProxyInfo(String host, int port) {
this.host = host;
this.port = port;
}
}
// Usage example
public static void main(String[] args) throws IOException {
List<ProxyInfo> proxies = Arrays.asList(
new ProxyInfo("proxy1.example.com", 8080),
new ProxyInfo("proxy2.example.com", 8080),
new ProxyInfo("proxy3.example.com", 8080)
);
ProxyRotator rotator = new ProxyRotator(proxies);
Document doc = rotator.fetch("https://example.com");
System.out.println(doc.title());
}
}
Best Practices
- Always set timeouts when using proxies to avoid hanging connections
- Implement retry logic for handling temporary proxy failures
- Rotate User-Agent headers to avoid detection
- Test proxy connectivity before using in production
- Handle authentication errors (HTTP 407) appropriately
- Use HTTPS proxies for secure data transmission
- Monitor proxy performance and switch if response times degrade
Security Considerations
- Avoid logging proxy credentials in application logs
- Use encrypted connections (HTTPS) when transmitting sensitive data
- Validate proxy certificates to prevent man-in-the-middle attacks
- Rotate proxy credentials regularly
- Monitor for proxy abuse that could compromise your application
Remember to replace placeholder values (proxy.example.com
, your_username
, etc.) with your actual proxy server details.