As online platforms grow, the need for oversight to prevent harmful posts becomes crucial. Some individuals use these platforms to hurt, insult, and spread hatred. This problem must be addressed using Natural Language Processing (NLP). Several Machine Learning models have been developed to filter out malicious content and protect users from online harassment. The aim is to create a safer and more respectful online environment. Effective models have been deployed to achieve this purpose.
Social media platforms in the Somali language face challenges with inappropriate comments such as insults, racist remarks, and identity-hate speeches. Existing hate speech detection models are not effective due to their hard-coded algorithms. No established models using NLP and deep neural networks handle these issues. Our study aims to develop an NLP and Deep Neural Networks-based model to understand, identify, and categorize Somali language content into six categories: Toxic, Obscene, Threat, Insult, Identity-hate, and Non-Toxic.
The project follows a systematic approach to classify toxic comments in the Somali language. The methodology is divided into several key steps:
- Dataset Collection Process: Selection platform, web scraping.
- Data Annotations: Processing, cleaning, tokenization, and removing stop words.
- Early Detection and Prevention of Overfitting: Techniques like class weighting, early stopping, and cross-validation.
- Model Training, Testing, and Evaluation Metrics.
- Server, Client, and Facebook Integration.
The implementation process involves detailed steps as visualized below:
- Data Collection: Gathering relevant data from social media platforms.
- Data Annotation: Processing the dataset, cleaning, tokenization, and removal of stop words.
- Overfitting Prevention: Implementing techniques like class weighting, early stopping, and cross-validation.
- Model Training and Testing: Training the NLP model, testing, and evaluating its performance.
- Integration: Integration of the model with the server, client, and Facebook for real-time detection of toxic comments.
Name | ID | Class No |
---|---|---|
Abdikafi Isse Isak ( Miirshe ) | C120868 | CA2013 |
Younis Mohamed Abukar ( Dalfac ) | C120855 | CA203 |
Mohamed Abdi Aadan ( Qazaafi ) | C1201004 | CA205 |
Raxmo Abdikadir Jama ( Raxmiish ) | C1201104 | CA202 |
