Uses "Twitter Streaming API" to get the target tweets(real-time) for a recent high traffic event(s), and storing them in MongoDB. Later, tweets can be filtered using REST API
- Endpoints to start streaming.
- Time limit based streaming.
- Ordering of tweets based on particular fields.
- Searching of words/text in tweets.
- Searching of tweets based on username.
- Filtering of data in integer fields based on min/max value provided
- Filtering of data in string fields based on ending or starting word or a word that the field should contain.
- Filtering of data based on starting and ending dates
- CSV export support
- Clone the git repository.
- Make sure python3.5+ is installed.
- Install the requirements listed in requirements.txt
- Open file settings.py
- Replace your-keys* , your-tokens* with your authentication keys and access tokens.
Open the terminal and enter:-
$ cd TweetStreamer
$ mkdir data
$ echo 'mongod --bind_ip=$IP --dbpath=data --nojournal --rest "$@"' > mongod
$ chmod a+x mongod
To start the Mongo Server :-
$ ./mongod
$ cd TweetStreamer
$ python app.py
Fields: In MongoDB, every document tweet
will contain following fields -
-
tweet_text
: string, -
screen_name
: string, -
user_name
: string, -
location
: string, -
source_device
: string, -
is_retweeted
: boolean, -
retweet_count
: integer, -
country
: string, -
country_code
: string, -
reply_count
: integer, -
favorite_count
: integer, -
created_at
: datetime, -
timestamp_ms
: long, -
lang
: string, -
hashtags
: array, -
quote_count
: integer
- here replace ____ by keywords you want to search seperated by a comma
- here replace {any-keyword} by keywords you want to search,
- {time-limit} by an integer value in seconds stating the time limit for which you want to stream the tweets.
- default time limit is 100 seconds
Response
{
"status": "success",
"message": "Started streaming tweets with keywords [u'cricket', u'football', u'Rafale']"
}
- to return a limited number of tweets :- replace limit=______ with an integer value
- to return based on page number with 50 tweets per page :- replace page=____ with an integer value
- here replace ______ by username to search for a particular user
- here replace ______ by the appropriate words you want to search for in tweet text
- for ascending :- replace ______ by field-name and
- for descending :- replace _______ by -field-name
- here enter the integer column name in which you want to search in the space column_name=______
- replace the space in end variable with the end word you want to search in the given column
- replace the space in start variable with the starting word you want to search in the given column
- enter both in case you want to give both constraints
- here enter the integer column name in which you want to search in the space column_name=______
- replace the contain variable with the word you want the resulting column entries must contain
- here enter the integer column name in which you want to search in the space column_name=______
- replace the match variable with the word to list down the entries with that word in the given column
- here enter the integer column name in which you want to search in the space column_name=______
- replace the space in max variable with the maximum value for the column name
- replace the space in min variable with minimum value for the column name
- enter both in case you want to give both constraints
- replace the space in startDate variable with the starting date from which you want the tweets to be displayed
- replace the space in endDate variable with the ending date from which you want the tweets to be displayed
- enter both in case you want to give both constraints
- Python 3.5
- Flask Framework
- MongoDB
- Tweepy (twitter streaming library in python)
[
{
"hashtags": [],
"user_name": "Zesty StL Blues",
"country": "",
"is_retweeted": false,
"location": "St Louis, MO",
"tweet_text": "Blues defenseman Dunn upset by apparent benching https://t.co/1gYP1FafRDhttps://t.co/FDWEZYTmUi",
"reply_count": 0,
"created_at": "Thu Oct 11 10:04:58 +0000 2018",
"source_device": "<a href=\"http://zestynews.com\" rel=\"nofollow\">Zesty Blues Tweets</a>",
"favorite_count": 0,
"country_code": "",
"timestamp_ms": "1539252298068",
"lang": "en",
"screen_name": "zesty_blues",
"quote_count": 0,
"retweet_count": 0
},
{......},
{......},
{......},
{......}
]