Sentiment analysis of twitter stream

Dependencies: Java - openjdk 11.0.15 2022-04-19 kafka_2.12-3.2.0 spark-3.3.0 Spark-streaming-kafka-0-8_2.11:3.3.0 elasticsearch-hadoop-5.6.4

Python packages: transformers ElasticSearch kafka tweepy pyspark JSOn datetime

Sentiment Analysis: -Tweets are classified as Positive, Negative or Neutral along with their scores. -Higher the score[0>score<1], more is the efficiency of classification

This project is done on WSL on windows.

How to run: -start zookeeper: bin/zookeeper-server-start.sh config/zookeeper.properties -start kafka: bin/kafka-server-start.sh config/server.properties -created a topic called "twitter" -Run "producertwitter.py", which streams data from twitter, clean, build to a JSON message and send it to the kafka cluster. -uses TwitterV2 API for data streaming. -can set the filter topic by mentioning in the variable "filterTopic" -If we want to delete the filter topics previously set, run the function remove_topics(ruleID) -remove_topics(obj_ID) -> Inside the function, the previously set ruleIDs are obtained and deleted through API endpoints.

-Run spark-streamer.py, which will connect to the topic "twitter". -Extract the text message from the recieved message -Used Huggingface pre trained library to perform sentiment analysis on the twitter data. -pass the sentiment data to elasticsearch through the dataframe. -Run elasticSearch_consumer.py, which establishes a connection between the elastic search and kafka.

ELK stack: -Launch elasticsearch, kibana and logstash by going inside their installation direcories. -In kibana, we can visually see the sentiment metrics of the data,change the timeline..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md

Readme.md

Files

Readme.md

Latest commit

History

Readme.md

File metadata and controls