Skip to content

Commit

Permalink
modified README
Browse files Browse the repository at this point in the history
  • Loading branch information
vvagias committed Jun 23, 2015
1 parent e8727af commit e1f11c3
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 10 deletions.
Binary file modified .DS_Store
Binary file not shown.
21 changes: 15 additions & 6 deletions README.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,24 +11,33 @@ _ / __ |/ |/ /_ / / /_ / /_ / __/ / ____/ // /_ _ / / __/ /_/ /_ /

To begin generating data:

1. First open twitterStream.py and add in the needed credentials for your twitter dev account.
1. First open twitter_kafka_direct and add in the needed credentials for your twitter dev account.
* http://dev.twitter.com
2. Ensure you have tweepy installed and that python can access the module.
- if you need tweepy and have pip :
2. Ensure you have all requirements installed and that python can access the modules (see requirements.txt) .
- if you need for example tweepy and have pip :
>> pip install tweepy
- if you don't have pip download get_pip.py (latest from the google webs :) and run:
>> python get_pip.py
>> pip install tweepy
- if you like you can also track tweepy down and install it manually but I don't see why when pip is so awesome.


3. Then test by opening a terminal window then cd into the directory with the python script and run:

### Replace the generic paths with the path in your configuration.
>> python /path/to/twitterstreaming.py
>> python /path/to/twitter_kafka_direct.py

4. to deliver the stream to csv:
- replace stubs with values for your tokens in twitterStream.py

>> python /path/to/twitterStream.py > twitterData.csv

5. to write data to kafka:
- Use twitter_kafka_direct.py. Replace token stubs with your values and state your topic mytopic
default is 'topic'

4. to deliver the stream to kafka run:
>> python /path/to/twitter_kafka_direct.py

>> python /path/to/twitterStream.py > pipe | path/to/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic twitterstream < pipe

that will begin to stream data events into a kafka producer.

Expand Down
Binary file modified python/.DS_Store
Binary file not shown.
7 changes: 3 additions & 4 deletions python/twitter_kafka_direct.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ def on_status(self, status):
message = str(status.user.followers_count) + ',' + str(status.user.friends_count) + ',' + str(status.user.statuses_count) + ',' + status.text + ',' + status.user.screen_name
msg = filter(lambda x: x in string.printable, message)
try:
#write out to kafka topic
producer.send_messages(mytopic, str(msg))
except Exception, e:
return True
Expand Down Expand Up @@ -89,14 +90,12 @@ def on_timeout(self):
stream = tweepy.Stream(auth, listener)

######################################################################
#Sample dilivers a stream of 1% (random selection) of all tweets
#Sample delivers a stream of 1% (random selection) of all tweets
######################################################################
stream.sample()
client = KafkaClient("localhost:9092")
producer = SimpleProducer(client)



stream.sample()
######################################################################
#Custom Filter rules pull all traffic for those filters in real time.
#Bellow are some examples add or remove as needed...
Expand Down

0 comments on commit e1f11c3

Please # to comment.