Hidden APIs are internal endpoints that websites use for their front-end applications but don't publicly document. Finding these APIs can provide cleaner, more efficient data extraction compared to HTML scraping. Here's a comprehensive guide to discover them.
Primary Discovery Methods
1. Browser Developer Tools (Most Effective)
The network tab in browser developer tools is your primary weapon for API discovery.
Step-by-Step Process:
- Open Developer Tools: Press
F12
(orCtrl+Shift+I
on Windows/Linux,Cmd+Opt+I
on Mac) - Navigate to Network Tab: Click "Network" and ensure recording is enabled
- Clear existing requests: Click the clear button (🚫) to start fresh
- Filter by request type: Use filters like
XHR
,Fetch
, orJS
to focus on API calls - Interact with the website: Perform actions that load the data you want to scrape
- Analyze requests: Look for requests returning JSON/XML data
Pro Tips for Network Analysis:
# Look for these URL patterns in the network tab:
/api/
/v1/
/v2/
/graphql
/ajax/
/json/
/_next/data/
/__data.json
Example Network Request Analysis:
GET /api/v2/products?page=1&limit=20&category=electronics
Host: example.com
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9...
User-Agent: Mozilla/5.0...
2. JavaScript Source Code Analysis
APIs are often hardcoded or dynamically constructed in JavaScript files.
Search Techniques:
// In browser console, search for these patterns:
// 1. Global search in Sources tab
// Ctrl+Shift+F (Windows/Linux) or Cmd+Shift+F (Mac)
// 2. Common search terms:
"fetch("
"axios."
"XMLHttpRequest"
"$.ajax"
"endpoint"
"baseURL"
"API_URL"
"/api/"
"graphql"
Example JavaScript API Discovery:
// Found in bundled JavaScript file:
const API_BASE = 'https://api.example.com/v3/';
const endpoints = {
products: `${API_BASE}products`,
categories: `${API_BASE}categories`,
search: `${API_BASE}search/query`
};
// Usage in code:
fetch(`${endpoints.products}?category=${categoryId}`)
.then(response => response.json())
3. WebSocket Traffic Inspection
For real-time applications, WebSockets often carry valuable data.
WebSocket Analysis Steps: 1. Filter by WS: In Network tab, filter by "WS" (WebSockets) 2. Monitor frames: Click on WebSocket connections to see message frames 3. Document message structure: Note the JSON message format and triggers
Example WebSocket Message:
{
"type": "product_update",
"data": {
"product_id": 12345,
"price": 99.99,
"stock": 15
},
"timestamp": "2024-01-15T10:30:00Z"
}
Advanced Discovery Techniques
4. Mobile App Traffic Analysis
Mobile apps often use simpler APIs that are easier to reverse-engineer.
Tools for Mobile Analysis:
# Using mitmproxy (cross-platform)
pip install mitmproxy
mitmproxy --mode transparent
# Using Charles Proxy (GUI tool)
# Configure mobile device to use proxy
# Monitor HTTPS traffic with SSL certificate installation
Python script for mitmproxy:
# save as addon_script.py
from mitmproxy import http
def response(flow: http.HTTPFlow) -> None:
if "api" in flow.request.pretty_url:
print(f"API Endpoint: {flow.request.method} {flow.request.pretty_url}")
print(f"Response: {flow.response.status_code}")
if flow.response.headers.get("content-type", "").startswith("application/json"):
print(f"JSON Response: {flow.response.text[:200]}...")
5. Subdomain and Path Enumeration
APIs are often hosted on separate subdomains or paths.
Subdomain Discovery:
# Using subfinder
subfinder -d example.com | grep api
# Using amass
amass enum -d example.com | grep -E "(api|v[0-9]+|dev|staging)"
# Common API subdomains to check:
api.example.com
api-v2.example.com
internal-api.example.com
mobile-api.example.com
Path Discovery:
# Using dirb/dirbuster for API path discovery
dirb https://example.com /usr/share/dirb/wordlists/common.txt
# API-specific wordlists:
/api/
/api/v1/
/api/v2/
/rest/
/graphql/
/json/
/ajax/
6. Browser Extension Method
Useful Browser Extensions: - Postman Interceptor: Captures all requests automatically - HTTP Request/Response Logger: Logs all network activity - Developer Tools++: Enhanced network monitoring
Practical Implementation Examples
Python Implementation with Session Handling
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
class HiddenAPIClient:
def __init__(self, base_url, headers=None):
self.base_url = base_url
self.session = requests.Session()
# Common headers that mimic browser behavior
default_headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin'
}
if headers:
default_headers.update(headers)
self.session.headers.update(default_headers)
# Setup retry strategy
retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
self.session.mount("http://", adapter)
self.session.mount("https://", adapter)
def get_data(self, endpoint, params=None):
"""Fetch data from discovered API endpoint"""
url = f"{self.base_url}/{endpoint.lstrip('/')}"
try:
response = self.session.get(url, params=params, timeout=30)
response.raise_for_status()
# Handle different response types
content_type = response.headers.get('content-type', '')
if 'application/json' in content_type:
return response.json()
elif 'text/' in content_type:
return response.text
else:
return response.content
except requests.exceptions.RequestException as e:
print(f"Error fetching {url}: {e}")
return None
# Usage example
client = HiddenAPIClient('https://api.example.com')
# Add authentication if discovered
client.session.headers.update({
'Authorization': 'Bearer your_discovered_token',
'X-API-Key': 'your_api_key'
})
# Fetch data from discovered endpoints
products = client.get_data('/api/v2/products', params={'category': 'electronics'})
user_data = client.get_data('/api/user/profile')
JavaScript/Node.js Implementation
const axios = require('axios');
class HiddenAPIClient {
constructor(baseURL, options = {}) {
this.client = axios.create({
baseURL,
timeout: 30000,
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'en-US,en;q=0.9',
...options.headers
}
});
// Add request interceptor for debugging
this.client.interceptors.request.use(
config => {
console.log(`Making request to: ${config.method.toUpperCase()} ${config.url}`);
return config;
},
error => Promise.reject(error)
);
// Add response interceptor for error handling
this.client.interceptors.response.use(
response => response,
error => {
console.error(`API Error: ${error.response?.status} ${error.response?.statusText}`);
return Promise.reject(error);
}
);
}
async fetchData(endpoint, params = {}) {
try {
const response = await this.client.get(endpoint, { params });
return response.data;
} catch (error) {
console.error(`Failed to fetch ${endpoint}:`, error.message);
throw error;
}
}
async postData(endpoint, data) {
try {
const response = await this.client.post(endpoint, data);
return response.data;
} catch (error) {
console.error(`Failed to post to ${endpoint}:`, error.message);
throw error;
}
}
}
// Usage
(async () => {
const apiClient = new HiddenAPIClient('https://api.example.com', {
headers: {
'Authorization': 'Bearer discovered_token',
'X-Requested-With': 'XMLHttpRequest'
}
});
try {
const products = await apiClient.fetchData('/api/v2/products', {
page: 1,
limit: 50,
category: 'electronics'
});
console.log('Products:', products);
} catch (error) {
console.error('Error:', error);
}
})();
API Authentication and Headers
Common Authentication Methods Found:
# 1. Bearer Token Authentication
headers = {
'Authorization': 'Bearer eyJ0eXAiOiJKV1QiLCUzI1NiJ9...'
}
# 2. API Key in Header
headers = {
'X-API-Key': 'your_api_key_here',
'X-RapidAPI-Key': 'rapid_api_key'
}
# 3. Custom Headers
headers = {
'X-Requested-With': 'XMLHttpRequest',
'X-CSRF-Token': 'csrf_token_value',
'Referer': 'https://example.com/page'
}
# 4. Cookies (session-based)
cookies = {
'sessionid': 'session_value',
'csrftoken': 'csrf_value'
}
Rate Limiting and Best Practices
import time
import random
from functools import wraps
def rate_limit(calls_per_second=1):
"""Decorator to add rate limiting"""
min_interval = 1.0 / calls_per_second
last_called = [0.0]
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
elapsed = time.time() - last_called[0]
left_to_wait = min_interval - elapsed
if left_to_wait > 0:
time.sleep(left_to_wait + random.uniform(0, 0.1))
ret = func(*args, **kwargs)
last_called[0] = time.time()
return ret
return wrapper
return decorator
# Usage
@rate_limit(calls_per_second=2) # Max 2 calls per second
def fetch_api_data(endpoint):
# Your API call here
pass
Legal and Ethical Considerations
Critical Guidelines:
- Terms of Service: Always review and comply with the website's terms of service
- Rate Limiting: Implement reasonable delays between requests to avoid overloading servers
- robots.txt: Respect the robots.txt file, even for APIs
- Data Privacy: Be aware of GDPR, CCPA, and other privacy regulations
- Copyright: Respect intellectual property rights
- Attribution: Give credit when required by the data source
Recommended Practices:
# Good: Respectful scraping with delays
import time
import random
def respectful_api_call(url):
# Random delay between 1-3 seconds
time.sleep(random.uniform(1, 3))
headers = {
'User-Agent': 'YourBot/1.0 (contact@yoursite.com)', # Identify yourself
'Accept': 'application/json'
}
return requests.get(url, headers=headers)
Legal Compliance Checklist: - [ ] Read and understand the website's Terms of Service - [ ] Check for explicit API usage policies - [ ] Implement rate limiting (1-2 requests per second maximum) - [ ] Use proper User-Agent identification - [ ] Respect HTTP status codes (especially 429 Too Many Requests) - [ ] Don't scrape personal or sensitive data without consent - [ ] Consider reaching out to the website owner for permission
Remember: Hidden APIs are meant for internal use. While discovering them isn't illegal, using them may violate terms of service. Always prioritize ethical scraping practices and consider reaching out to website owners for official API access when possible.
Troubleshooting Common Issues
403 Forbidden Errors: - Check if authentication headers are required - Verify the Referer header matches the website - Ensure User-Agent mimics a real browser
429 Too Many Requests: - Implement exponential backoff - Reduce request frequency - Use rotating proxies if necessary
CORS Issues (Browser-based): - APIs may block cross-origin requests - Use server-side scraping instead of browser-based - Consider using CORS proxy for development only