NLP with Disaster Tweets - Kaggle Competition

This repository contains a series of machine learning models and analyses for the Kaggle competition "Natural Language Processing with Disaster Tweets". The goal of this competition is to predict whether a given tweet describes a real disaster or not.

Models Implemented

Below are the models that have been implemented and evaluated for this task:

DistilBERT: A transformer-based model fine-tuned using the LoRA (Low-Rank Adaptation) approach to efficiently adjust model weights for tweet classification.
Naive Bayes: A probabilistic classifier that uses word frequencies to classify tweets as disaster-related or not.
Logistic Regression: A linear model that applies the sigmoid function to classify tweets based on extracted features.
XGBoost: A gradient boosting algorithm designed for performance and scalability, used here to classify tweets as disaster-related or not.

Performance Summary

Below is a summary of the performance of each model based on their accuracy scores on the Kaggle competition's test dataset:

Model	Accuracy Score
DistilBERT	0.79650
Naive Bayes	0.79007
Logistic Regression	0.73827
XGBoost	0.73797

Note: The accuracy scores are reflective of the model's performance on the Kaggle competition's test dataset.

Conclusion

DistilBERT outperforms other models with the highest accuracy, making it a strong candidate for disaster tweet classification.
Naive Bayes and Logistic Regression are competitive, with Naive Bayes showing slightly better performance than Logistic Regression.
XGBoost, despite being a powerful algorithm, yields a slightly lower accuracy compared to the other models tested.

Each model has its advantages, and the choice of model depends on the trade-offs between performance, interpretability, and computational cost.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
results		results
.gitattributes		.gitattributes
README.md		README.md
Tweet_Disaster_Classification_distilbert_full_tuning.ipynb		Tweet_Disaster_Classification_distilbert_full_tuning.ipynb
Tweet_Disaster_Classification_distilbert_lora_tuning.ipynb		Tweet_Disaster_Classification_distilbert_lora_tuning.ipynb
Tweet_Disaster_Classification_logistic_regression.ipynb		Tweet_Disaster_Classification_logistic_regression.ipynb
Tweet_Disaster_Classification_naive_bayes.ipynb		Tweet_Disaster_Classification_naive_bayes.ipynb
Tweet_Disaster_Classification_xgboost.ipynb		Tweet_Disaster_Classification_xgboost.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP with Disaster Tweets - Kaggle Competition

Models Implemented

Performance Summary

Conclusion

About

Languages

yvesemmanuel/nlp_disaster_tweets

Folders and files

Latest commit

History

Repository files navigation

NLP with Disaster Tweets - Kaggle Competition

Models Implemented

Performance Summary

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Languages