Toxic Comment Classification in Somali Language

Introduction

As online platforms grow, the need for oversight to prevent harmful posts becomes crucial. Some individuals use these platforms to hurt, insult, and spread hatred. This problem must be addressed using Natural Language Processing (NLP). Several Machine Learning models have been developed to filter out malicious content and protect users from online harassment. The aim is to create a safer and more respectful online environment. Effective models have been deployed to achieve this purpose.

Problem Statement

Social media platforms in the Somali language face challenges with inappropriate comments such as insults, racist remarks, and identity-hate speeches. Existing hate speech detection models are not effective due to their hard-coded algorithms. No established models using NLP and deep neural networks handle these issues. Our study aims to develop an NLP and Deep Neural Networks-based model to understand, identify, and categorize Somali language content into six categories: Toxic, Obscene, Threat, Insult, Identity-hate, and Non-Toxic.

Methodology

The project follows a systematic approach to classify toxic comments in the Somali language. The methodology is divided into several key steps:

Dataset Collection Process: Selection platform, web scraping.
Data Annotations: Processing, cleaning, tokenization, and removing stop words.
Early Detection and Prevention of Overfitting: Techniques like class weighting, early stopping, and cross-validation.
Model Training, Testing, and Evaluation Metrics.
Server, Client, and Facebook Integration.

Implementation Process

The implementation process involves detailed steps as visualized below:

Data Collection: Gathering relevant data from social media platforms.
Data Annotation: Processing the dataset, cleaning, tokenization, and removal of stop words.
Overfitting Prevention: Implementing techniques like class weighting, early stopping, and cross-validation.
Model Training and Testing: Training the NLP model, testing, and evaluating its performance.
Integration: Integration of the model with the server, client, and Facebook for real-time detection of toxic comments.

Authors

Name	ID	Class No
Abdikafi Isse Isak ( Miirshe )	C120868	CA2013
Younis Mohamed Abukar ( Dalfac )	C120855	CA203
Mohamed Abdi Aadan ( Qazaafi )	C1201004	CA205
Raxmo Abdikadir Jama ( Raxmiish )	C1201104	CA202

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
client		client
server		server
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxic Comment Classification in Somali Language

Introduction

Problem Statement

Methodology

Implementation Process

Authors

Screenshots

Methodology

Model Performance Evaluation Matrix

Implementation Process

About

Releases

Packages

Languages

miirshe/somali-classifier-comments-bert

Folders and files

Latest commit

History

Repository files navigation

Toxic Comment Classification in Somali Language

Introduction

Problem Statement

Methodology

Implementation Process

Authors

Screenshots

Methodology

Model Performance Evaluation Matrix

Implementation Process

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages