Skip to content

ssary/German-Sentiment-Analysis

Repository files navigation

Sentiment Analysis Application

Welcome to our Python desktop application for sentiment analysis, Using our model XLM-RoBERTa-German-Sentiment model. Our application provides sentiment analysis across 8 languages, with focus on the German language. This tool is for anyone interested in uncovering insights from textual data.

You can refer to the sentiment analysis model details on Hugging Face

Refer to the paper for more information about the training methodology and the results of the model used in the application and the Design and Diagrams of the application.

Features

  • Sentiment analysis across 8 languages, specializing in German with 87% F1 score.

  • Utilizes the robust XLM-RoBERTa architecture and fine tuned with German dataset contains many domains, the dataset is subset of German Bert's Dataset.

  • The Model-View-Controller (MVC) design pattern has been implemented to separate thefront-end and back-end code into different components, this separation facilitates more manageable changes and updates to each side, reducing the risk of interference between the two components.

  • For the database, I'm using PostgreSQL and implementing Object-Relational Mapping (ORM) which adds a layer of abstraction over the database operations, allowing to work with the data and tables as objects rather than SQL queries, this is implemented using SQLAlchemy in Python, also it contains some useful abstract database operations creating a set of classes that encapsulate all database-related interactions, this abstraction allows for more manageable, modular, and maintainable code.

Installation

  1. Clone the repository:
git clone https://github.com/ssary/German-Sentiment-Analysis
  1. Change the directory to the repo folder with:
cd '.\XLM-RoBERTa model\'
  1. Create virtual environment "myenv" with:
python -m venv myenv.
  1. To activate the virtual environment:
source myenv/Scripts/activate

or with:

source myenv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Change the Database URL in the settings.env to your database URL, the format of postgresql is DATABASE_URL="postgresql://USERNAME:YOUR_PASSWORD@HOST:PORT/DATABASE_NAME where you change USERNAME, YOUR_PASSWORD, HOST usually is localhost, PORT is usually 5432 and DATABASE_NAME with the corresponding values.

  2. Launch the application:

python controller.py

Usage

Add review text with any of these 8 language (German, Arabic, English, French, Hindi, Italian, portuguese, Spanish) and you'll get positive, negative or neutral for the review.

img

Training Scripts

Results

Here is comparison between the model before fine tuning and after fine tuning, the F1 accuracy increased by 10%, achieving 87% accuracy on German Bert Dataset.

F1 accuracy between the 4 models

Dataset Acknowledgment

We extend our heartfelt gratitude to Oliver Guhr for developing the German-language dataset utilized in training our model. This dataset, available on GitHub, has been instrumental in enhancing our model's performance. For more details on the dataset this GitHub repository

References

  • For more on the XLM-RoBERTa architecture and its advantages, see the RoBERTa paper.
  • Our model's fine-tuning and training are based on the principles outlined in the xlm-t paper.

Contact

For any inquiries or further information, feel free to contact me at sarynasser1@gmail.com.

Releases

No releases published

Packages

No packages published