This project involves using Kafka's producer and consumer APIs to analyze a social media dataset and generate output files for three different clients. The dataset consists of user actions such as likes, shares, and comments on posts.
- Python 3.x
- Kafka-Python library
- kafka-producer.py: Python script for producing social media dataset to Kafka topics.
- kafka-consumer1.py: Consumer script for Client 1 - Comments on posts.
- kafka-consumer2.py: Consumer script for Client 2 - Number of likes on posts.
- kafka-consumer3.py: Consumer script for Client 3 - User popularity calculation.
- Start Kafka server.
- Create three Kafka topics (replace topicName1, topicName2, topicName3 with actual topic names):
kafka-topics.sh --create --topic topicName1 --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
kafka-topics.sh --create --topic topicName2 --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
kafka-topics.sh --create --topic topicName3 --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
- Run consumers in separate terminals:
python3 kafka-consumer1.py topicName1 topicName2 topicName3 > output1.json
python3 kafka-consumer2.py topicName1 topicName2 topicName3 > output2.json
python3 kafka-consumer3.py topicName1 topicName2 topicName3 > output3.json
- Run the producer in a separate terminal:
cat dataset.txt | python3 kafka-producer.py topicName1 topicName2 topicName3
- Client 1 List down all the comments received on posts for all users.
{
"@username1" : [
"comment1",
"comment2"
],
"@username2" : [
"comment1",
"comment2"
],
...
}
- Client 2 List down the number of likes received on different posts for each user.
{
"@username1" : {
"post-id-1" : no_of_likes,
"post-id-2" : no_of_likes
},
...
}
- Client 3 Calculate the popularity of a user based on the number of likes, shares, and comments on the user’s profile.
{
"@username_1": popularity,
"@username_2": popularity,
...
}