Twitch Chat Analyzer

Goal Description

The main goal of this project is to provide a useful tool for keeping track of events related to live chat on Twitch using Sentiment Analysis.

Sentiment Analysis

Sentiment analysis (also known as opinion mining or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.

Source: Wikipedia

Problem description

Live Twitch chats, especially if there are a lot of spectators, are really difficult to follow and moderate. Moderators are people that help the Streamer relies to prevent their chat from becoming a jungle of frustrated monkeys.

This tool aims to help moderators and streamers keeping track of the interactions between the streamer and its audience, making use of Sentiment Analysis.

Data Pipeline Technologies

Ingestion: Kafka Connect with custom connector and PircBotX
Streaming: Apache Kafka
Processing: Spark Streaming, Spark SQL, PySpark(2.4.6)
Machine Learning/Sentiment Analysis: Vader Sentiment Analysis
Indexing: ElasticSearch
Visualization: Kibana

Containerization: Docker

Project structure

The project workflow follows the structure above.

In brief

The bot created by the PircBotX interface receives the messages sent in a chat selected by the user through the IRC protocol (Internet Relay Chat).
A JSON is built with data and metadata provided by the bot. It is inserted into a Message Queue from which the connector structure picks them up and inserts them into the Kafka-topic: twitch.
From there they are taken via a python script from the Spark Steaming interface.
Spark SQL reconstructs the JSON, it consumes the message, making a Dataframe available. Spark SQL also communicates with the Vader Sentiment Analysis library which provides a result on the analysis of the message.
"sentiment" field is added with the result obtained by Vader's elaboration, reported among one of the following classes: very_positive, positive_opinion, neutral_opinion, negative_opinion, very_negative, ironic.
The newly built RDD is indexed through the product of the elastic family, Elasticsearch.
Kibana (another elastic tool) deals with aggregating and placing metrics making data available through a user interface.

Technical insights

There is doc file similar to this in each folder to get information for each specific component.

Boot up process

In the /bin folder there are shell scripts that allow you to start the following project.

N.B. This project uses Docker as a containerization tool. Make sure you have it installed. Look online to understand how to install it in your system.

N.B. when files are downloaded to Linux machines, many versions remove execution permission for security reason, to add it to all sh files in this project folder, run:

$ cd path_to_cloned_repo
$ find ./ -type f -iname "*.sh" -exec chmod +x {} \;

Fist time running: In the Kafka/Kafka-Settings folder, rename chat-channel.properties.dist to chat-channel.properties and set all parameters required by Twitch connection, instructions in the same file. Once set up, continue.

N.B. You need to download and insert the tgz file in the Kafka/Kafka-Settings folder, you can download it from here.

N.B. You need to download and insert the tgz file in the Spark/Python folder, you can download it from here.

Other running: use bin/set-observed-channel.sh CHANNELNAME to change observed Twitch channel.

All in one solution

In the bin folder, start the following script:

$ bin/docker-compose.sh

Long solution, start the machines individually

In the bin folder, start in order the following scripts, in different bash:

$ bin/create-network.sh
"~~~ Wait until end logging ~~~"

$ bin/zookeeper-start.sh
"~~~ Wait until end logging ~~~"

$ bin/kafka-start.sh
"~~~ Wait until end logging ~~~"

$ bin/elasticsearch-start.sh
"~~~ Wait until end logging ~~~"

$ bin/kibana-start.sh
"~~~ Wait until end logging ~~~"

$ bin/spark-consumer-start.sh
"~~~ Wait until end logging ~~~"

It will start individual components such as Zookeeper, Kafka, Elastisearch, Kibana. When Spark starts, follow the instructions on the screen and choose Python.

To stop all running container, just ctrl + C in their own shell.

Almost done

In the browser, enter the following address: http://10.0.100.52:5601 . To set up Kibana, see its guide in the Kibana folder.

Volumes

Docker volume is used to keep data when container is deleted or pruned. If you do not want to use it, just delete it from docker-compose.yml. If you are running long solution, delete -v parameter in the bin/elasticsearch-start.sh file. Then you can run bin/drop-elasticsearch-volume.sh to delete the volume permanently.

Developed by

Danilo Santitto

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitch Chat Analyzer

Goal Description

Sentiment Analysis

Problem description

Data Pipeline Technologies

Project structure

In brief

Technical insights

Boot up process

All in one solution

Long solution, start the machines individually

Almost done

Volumes

Developed by

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
ElasticSearch		ElasticSearch
Kafka		Kafka
Kibana		Kibana
Spark		Spark
Zookeeper		Zookeeper
bin		bin
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

License

wredan/Twitch-Chat-Analyzer

Folders and files

Latest commit

History

Repository files navigation

Twitch Chat Analyzer

Goal Description

Sentiment Analysis

Problem description

Data Pipeline Technologies

Project structure

In brief

Technical insights

Boot up process

All in one solution

Long solution, start the machines individually

Almost done

Volumes

Developed by

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages