Reddit Sarcasm Detection with Distilbert

This repository contains source code for detecting sarcasm in reddit comments. The dataset used in this analysis is available here on kaggle.

Requirements

SARC.yaml contains a conda environment containing all required python packages needed to run the source code. paths.json lists all the filepaths used in running the source code. The home filepath must be filled in prior to running source code. The train-balanced-sarcasm.csv file from the dataset must be placed in the data directory prior to running source code.

Source Code

bert.py: Contains a wrapper class for the huggingface transformers DistilBert implementation.
clean.py: Preprocessing script for the sarcasm dataset.
dataset.py: Contains a torch.utils.data.Dataset class for the sarcasm dataset.
stats.py: Script for plotting token count distribution of sarcasm dataset.
test.py: Script for calculating test set accuracy.
train.py: Script for fine-tuning DistilBert for sarcasm detection

Running Source Code

Code must be run in the following order in order to produce sarcasm detection train and test results.

clean.py
train.py
test.py

Results of train.py and test.py will appear in the command prompt once the training and testing processes have completed.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
paths.json		paths.json
rsarc.yaml		rsarc.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reddit Sarcasm Detection with Distilbert

Requirements

Source Code

Running Source Code

About

Releases

Packages

Languages

akshat0123/RSarcasm

Folders and files

Latest commit

History

Repository files navigation

Reddit Sarcasm Detection with Distilbert

Requirements

Source Code

Running Source Code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages