Twitter is one of the most popular social media platforms, with millions of users tweeting and sharing their thoughts and opinions every day. As a result, Twitter has become a valuable source of data for businesses, researchers, and developers who want to analyze user behavior, sentiment, and trends.
The best way to get the data from Twitter is to use their official API. However, Twitter's API is becoming more and more restrictive: you can only get a limited amount of data from Twitter using their API. On top of that even the lowest tier of the API will require a monthly payment soon. That's why many people are looking for ways to scrape Twitter without these restrictions.
Here we've collected a few methods of scraping Twitter data without using the official API.
How to Scrape Twitter Tweets
For tweets, you can use the private search API. This API is not officially documented, but it's still working. You can use it to search for tweets by keyword, hashtag, or username. The API returns a JSON with the tweets, which you can then parse and save.
The usage of this API consists of 2 steps:
- Getting the quest token
- Making the search request
Getting the quest token
This search API is intended to be used by non-authenticated users, but it still requires a token to be sent with the request. This token is called the quest token. You can get the token by making a POST request to the following URL: https://api.twitter.com/1.1/guest/activate.json This request itself requires a bearer token that is static and always the same for these endpoints.
import http.client
import urllib
import json
conn = http.client.HTTPSConnection("api.webscraping.ai")
# 1. Getting the quest token
# this header is static and used for both requests
auth_header = 'Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA'
api_params = {
'api_key': 'test-api-key',
'js': 'false',
'timeout': 25000,
'url': f'https://api.twitter.com/1.1/guest/activate.json',
'headers[Authorization]': auth_header
}
conn.request("POST", f"/html?{urllib.parse.urlencode(api_params)}")
res = conn.getresponse()
json_raw = res.read().decode("utf-8")
json_object = json.loads(json_raw)
quest_token = json_object['guest_token']
print("Using quest token: " + quest_token)
Making the search request
Now we can use the quest token to make the search request. The search request is a GET request to the following URL: https://api.twitter.com/2/search/adaptive.json Also, we still need to add the same bearer headers as in the previous request.
twitter_params = {
'q': 'test',
'count': 100,
# 'cursor': '',
'include_want_retweets': 1,
'include_quote_count': 'true',
'include_reply_count': 1,
'tweet_mode': 'extended',
'include_entities': 'true',
'include_user_entities': 'true',
'simple_quoted_tweet': 'true',
'tweet_search_mode': 'live',
'query_source': 'typed_query'
}
api_params = {
'api_key': 'test-api-key',
'js': 'false',
'timeout': 25000,
'url': f'https://api.twitter.com/2/search/adaptive.json?{urllib.parse.urlencode(twitter_params)}',
'headers[Authorization]': auth_header,
'headers[X-Guest-Token]': quest_token
}
conn.request("GET", f"/html?{urllib.parse.urlencode(api_params)}")
res = conn.getresponse()
json_raw = res.read().decode("utf-8")
json_object = json.loads(json_raw)
print(json.dumps(json_object, indent=1))
Advanced search
In this example we've just used the word "test", but you can search for any keyword, hashtag, or username, and use all the features available on Twitter search https://help.twitter.com/en/using-twitter/twitter-advanced-search All you need to do, is to construct the search query using the advanced search builder https://twitter.com/search-advanced and then use the query in the "q" parameter of the search request. This will allow you to search the tweets by:
- Keywords
- Phrases
- Negative keywords
- Hashtags
- Languages
- Usernames
- Mentions
- Replies
- Dates
- Minimum replies
- Minimum retweets
- Minimum replies
Demo
Here is an interactive example of this code: