This repository contains the codes and external data used for the paper:
Aunabil Chakma and Masum Hasan, "LowResource at BLP-2023 Task 2: Leveraging BanglaBert for Low Resource Sentiment Analysis of Bangla Language", In First Workshop on Bangla Language Processing at EMNLP 2023.
Shared Task Link: https://github.com/blp-workshop/blp_task2
Workshop page Link: https://blp-workshop.github.io/
- external_data/external_data_with_adjusted_labels.tsv - contains the several dataset(without banglabook) with adjusted labels. The datasets links are listed below.
- "Emonoba: A dataset for analyzing fine-grained emotions on noisy bangla texts."
- "Cross-lingual sentiment classification in low-resource bengali language."
- "Bemoc: A corpus for identifying emotion in bengali texts."
- "Datasets for aspect-based sentiment analysis in bangla and its baseline evaluation."
- "An aspect-based sentiment analysis dataset for bengali and its baseline evaluation."
- "Abusive content detection in transliterated bengali-english social media corpus."
- "Emotion classification in a resource constrained language using transformer-based approach."
- external_data/external_data_banglabook_with_adjusted_label.tsv - contains the only banglabook dataset with adjusted labels.
- external_data/paraphrasing_of_train_set_by_BanglaT5.tsv - contains the paraphrased data of the train set provided for the task by using BanglaT5.
First, install the libraries with specific versions as mentioned in the requirements.txt file.
Both files under codes/ contain the CONFIG class with the necessary hyper-parameter and other fields for training. Change the values accordingly and run the file.