How do I access the raw response content using Requests?

When using the requests library in Python, you can access the raw response content of a request in several ways depending on what you need:

  1. Text Content: If you want to access the textual representation of the response, typically for text or HTML content, you can use the .text attribute.
import requests

response = requests.get('http://example.com')
text_content = response.text
print(text_content)
  1. Binary Content: If the response data is binary data (like an image or PDF file), you can use the .content attribute to access the raw byte content.
import requests

response = requests.get('http://example.com/image.png')
binary_content = response.content

# Saving the binary content to a file
with open('image.png', 'wb') as file:
    file.write(binary_content)
  1. Raw Response Content: If you need the raw response content (unaltered bytes, as they came from the server), you can set the stream parameter to True in your request and then use the .raw attribute. You'll need to make sure to use response.raw.read() to actually read the data. This is useful when you want to process the data as it streams in, which can be more memory-efficient for large responses.
import requests

response = requests.get('http://example.com', stream=True)

with open('file', 'wb') as fd:
    for chunk in response.iter_content(chunk_size=128):
        fd.write(chunk)

Or if you really want to access the raw socket response:

import requests

response = requests.get('http://example.com', stream=True)
raw_content = response.raw

# Be careful with this; it could use a lot of memory if the response is large
raw_data = raw_content.read()

# It's usually better to read it in chunks, especially for large responses
with open('file', 'wb') as fd:
    fd.write(response.raw.read(1024))

Remember that when you use stream=True, you should consume the data or close the response. Not doing so can lead to inefficient use of connections and resources.

Note: The requests library will automatically decode the content from the server. This means that when you access response.text, it will contain the content in the form the server has sent it, typically encoded in a character set like UTF-8. The .text attribute will use the character encoding specified by the response headers under charset, or it will fall back to ISO-8859-1 if no charset is specified. If you access response.content, you get the raw bytes with no encoding applied.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon