Skip to content

This tool aims to help moderators and streamers keeping track of the interactions between the streamer and its audience, making use of Sentiment Analysis.

License

Notifications You must be signed in to change notification settings

wredan/Twitch-Chat-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitch Chat Analyzer

Goal Description

The main goal of this project is to provide a useful tool for keeping track of events related to live chat on Twitch using Sentiment Analysis.

Sentiment Analysis

Sentiment analysis (also known as opinion mining or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.

Source: Wikipedia

Problem description

Live Twitch chats, especially if there are a lot of spectators, are really difficult to follow and moderate. Moderators are people that help the Streamer relies to prevent their chat from becoming a jungle of frustrated monkeys.

This tool aims to help moderators and streamers keeping track of the interactions between the streamer and its audience, making use of Sentiment Analysis.

Data Pipeline Technologies

Containerization: Docker

Project structure

workflow

The project workflow follows the structure above.

In brief

  1. The bot created by the PircBotX interface receives the messages sent in a chat selected by the user through the IRC protocol (Internet Relay Chat).
  2. A JSON is built with data and metadata provided by the bot. It is inserted into a Message Queue from which the connector structure picks them up and inserts them into the Kafka-topic: twitch.
  3. From there they are taken via a python script from the Spark Steaming interface.
  4. Spark SQL reconstructs the JSON, it consumes the message, making a Dataframe available. Spark SQL also communicates with the Vader Sentiment Analysis library which provides a result on the analysis of the message.
  5. "sentiment" field is added with the result obtained by Vader's elaboration, reported among one of the following classes: very_positive, positive_opinion, neutral_opinion, negative_opinion, very_negative, ironic.
  6. The newly built RDD is indexed through the product of the elastic family, Elasticsearch.
  7. Kibana (another elastic tool) deals with aggregating and placing metrics making data available through a user interface.

Technical insights

There is doc file similar to this in each folder to get information for each specific component.

Boot up process

In the /bin folder there are shell scripts that allow you to start the following project.

N.B. This project uses Docker as a containerization tool. Make sure you have it installed. Look online to understand how to install it in your system.

N.B. when files are downloaded to Linux machines, many versions remove execution permission for security reason, to add it to all sh files in this project folder, run:

$ cd path_to_cloned_repo
$ find ./ -type f -iname "*.sh" -exec chmod +x {} \;

Fist time running: In the Kafka/Kafka-Settings folder, rename chat-channel.properties.dist to chat-channel.properties and set all parameters required by Twitch connection, instructions in the same file. Once set up, continue.

N.B. You need to download and insert the tgz file in the Kafka/Kafka-Settings folder, you can download it from here.

N.B. You need to download and insert the tgz file in the Spark/Python folder, you can download it from here.

Other running: use bin/set-observed-channel.sh CHANNELNAME to change observed Twitch channel.

All in one solution

In the bin folder, start the following script:

$ bin/docker-compose.sh

Long solution, start the machines individually

In the bin folder, start in order the following scripts, in different bash:

$ bin/create-network.sh
"~~~ Wait until end logging ~~~"

$ bin/zookeeper-start.sh
"~~~ Wait until end logging ~~~"

$ bin/kafka-start.sh
"~~~ Wait until end logging ~~~"

$ bin/elasticsearch-start.sh
"~~~ Wait until end logging ~~~"

$ bin/kibana-start.sh
"~~~ Wait until end logging ~~~"

$ bin/spark-consumer-start.sh
"~~~ Wait until end logging ~~~"

It will start individual components such as Zookeeper, Kafka, Elastisearch, Kibana. When Spark starts, follow the instructions on the screen and choose Python.

To stop all running container, just ctrl + C in their own shell.

Almost done

In the browser, enter the following address: http://10.0.100.52:5601 . To set up Kibana, see its guide in the Kibana folder.

Volumes

Docker volume is used to keep data when container is deleted or pruned. If you do not want to use it, just delete it from docker-compose.yml. If you are running long solution, delete -v parameter in the bin/elasticsearch-start.sh file. Then you can run bin/drop-elasticsearch-volume.sh to delete the volume permanently.

Developed by

Danilo Santitto

About

This tool aims to help moderators and streamers keeping track of the interactions between the streamer and its audience, making use of Sentiment Analysis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published