Table of contents

How can I use Curl to download a webpage?

Curl is a powerful command-line tool for transferring data using various protocols including HTTP, HTTPS, FTP, and more. It's an essential tool for developers who need to download webpages programmatically.

Basic Webpage Download

The simplest way to download a webpage with Curl:

curl https://example.com

This outputs the webpage content directly to your terminal. To save it to a file, use output redirection:

curl https://example.com > webpage.html

Output Options

Using -o (specify filename)

curl -o webpage.html https://example.com

Using -O (use remote filename)

# Downloads as "index.html" if that's the remote filename
curl -O https://example.com/index.html

Append to existing file

curl https://example.com >> combined.html

Advanced Download Options

Follow redirects

Many websites redirect URLs. Use -L to follow redirects:

curl -L -o webpage.html https://example.com

Include response headers

Save both headers and content for debugging:

curl -i -o webpage.html https://example.com

Set custom User-Agent

Some websites block requests without proper User-Agent headers:

curl -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" \
     -o webpage.html https://example.com

Download with progress bar

Show download progress for large files:

curl -# -o webpage.html https://example.com

Set timeout

Prevent hanging on slow connections:

curl --connect-timeout 30 --max-time 60 -o webpage.html https://example.com

HTTPS and Security

Standard HTTPS download

curl -o secure.html https://secure-site.com

Skip SSL certificate verification (not recommended for production)

curl -k -o webpage.html https://self-signed-cert-site.com

Specify CA certificate bundle

curl --cacert /path/to/cert.pem -o webpage.html https://example.com

Handling Authentication

Basic authentication

curl -u username:password -o private.html https://protected-site.com

Send cookies

curl -b "session=abc123; user=john" -o webpage.html https://example.com

Save and load cookies

# Save cookies
curl -c cookies.txt -o webpage.html https://example.com

# Use saved cookies
curl -b cookies.txt -o another-page.html https://example.com/protected

Multiple Downloads

Download multiple URLs

curl -o page1.html https://example.com/page1 \
     -o page2.html https://example.com/page2

Use URL patterns

# Downloads page1.html, page2.html, page3.html
curl -o "page#1.html" https://example.com/page[1-3]

Best Practices

  1. Always use HTTPS when available for security
  2. Set appropriate timeouts to avoid hanging connections
  3. Follow redirects with -L for reliable downloads
  4. Use proper User-Agent to avoid being blocked
  5. Handle errors gracefully with --fail to exit on HTTP errors

Complete example with best practices:

curl -L --fail --connect-timeout 30 --max-time 300 \
     -A "Mozilla/5.0 (compatible; MyBot/1.0)" \
     -o downloaded-page.html \
     https://example.com

This command follows redirects, exits on HTTP errors, sets reasonable timeouts, uses a custom User-Agent, and saves the output to a file.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon