Skip to content

As online platforms grow, the need for oversight to prevent harmful posts becomes crucial. Some individuals use these platforms to hurt, insult, and spread hatred. This problem must be addressed using Natural Language Processing (NLP). Several Machine Learning models have been developed to filter out malicious content and protect users from online

Notifications You must be signed in to change notification settings

miirshe/somali-classifier-comments-bert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Toxic Comment Classification in Somali Language

Introduction

As online platforms grow, the need for oversight to prevent harmful posts becomes crucial. Some individuals use these platforms to hurt, insult, and spread hatred. This problem must be addressed using Natural Language Processing (NLP). Several Machine Learning models have been developed to filter out malicious content and protect users from online harassment. The aim is to create a safer and more respectful online environment. Effective models have been deployed to achieve this purpose.

Problem Statement

Social media platforms in the Somali language face challenges with inappropriate comments such as insults, racist remarks, and identity-hate speeches. Existing hate speech detection models are not effective due to their hard-coded algorithms. No established models using NLP and deep neural networks handle these issues. Our study aims to develop an NLP and Deep Neural Networks-based model to understand, identify, and categorize Somali language content into six categories: Toxic, Obscene, Threat, Insult, Identity-hate, and Non-Toxic.

Methodology

The project follows a systematic approach to classify toxic comments in the Somali language. The methodology is divided into several key steps:

  1. Dataset Collection Process: Selection platform, web scraping.
  2. Data Annotations: Processing, cleaning, tokenization, and removing stop words.
  3. Early Detection and Prevention of Overfitting: Techniques like class weighting, early stopping, and cross-validation.
  4. Model Training, Testing, and Evaluation Metrics.
  5. Server, Client, and Facebook Integration.

Implementation Process

The implementation process involves detailed steps as visualized below:

  1. Data Collection: Gathering relevant data from social media platforms.
  2. Data Annotation: Processing the dataset, cleaning, tokenization, and removal of stop words.
  3. Overfitting Prevention: Implementing techniques like class weighting, early stopping, and cross-validation.
  4. Model Training and Testing: Training the NLP model, testing, and evaluating its performance.
  5. Integration: Integration of the model with the server, client, and Facebook for real-time detection of toxic comments.

Authors

Name ID Class No
Abdikafi Isse Isak ( Miirshe ) C120868 CA2013
Younis Mohamed Abukar ( Dalfac ) C120855 CA203
Mohamed Abdi Aadan ( Qazaafi ) C1201004 CA205
Raxmo Abdikadir Jama ( Raxmiish ) C1201104 CA202

Screenshots

Methodology

image

Model Performance Evaluation Matrix

image

Implementation Process

image

About

As online platforms grow, the need for oversight to prevent harmful posts becomes crucial. Some individuals use these platforms to hurt, insult, and spread hatred. This problem must be addressed using Natural Language Processing (NLP). Several Machine Learning models have been developed to filter out malicious content and protect users from online

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages