Coding Assignment 2: Hackathon - Binary Sentiment Analysis

Introduction

The second coding assignment asks you to implement a simple natural language processing model for sentiment analysis on the Amazon Review Dataset kaggle page of this coding assignment

You can use some deep learning libraries (e.g., PyTorch, Tensorflow) to accelerate your code with CUDA back-end.

Note: we will use Python 3.x for the project.

Submission checklist

Push your code to github classroom page's CA2 section
Submit your report to Gradescope 'CA2 (Hackathon) Report' section
Submit your entry to Kaggle

What to submit

Push to your github classroom

All of the python files listed above (under "Files you'll edit").
- Caution: DO NOT UPLOAD THE DATASET

Construct the dataset (10%)

Construct the training set for the amazon review dataset as instructed and report the following statistics.

REPORT1: Please fill the below table in the report

Statistics	Value
the total number of unique words in T	Plz, fill this
the total number of training examples in T	Plz, fill this
the ratio of positive examples to negative examples in T	Plz, fill this
the average length of document in T	Plz, fill this
the max length of document in T	Plz, fill this

Performance of deep neural network for classification (40%)

Suggested hyperparameters:

Data processing
1. Word embedding dimension: 100
2. Word Index: keep the most frequent 10k words
CNN
1. Network: Word embedding lookup layer -> 1D CNN layer -> fully connected layer -> output prediction
2. Number of filters: 100
3. Filter length: 3
4. CNN Activation: Relu
5. Fully connected layer dimension 100, activation: None (i.e. this layer is linear)
RNN
1. Network: Word embedding lookup layer -> LSTM layer -> fully connected layer(on the hidden state of the last LSTM cell) -> output prediction
2. Hidden dimension for LSTM cell: 100
3. Activation for LSTM cell: tanh
4. Fully connected layer dimension 100, activation: None (i.e. this layer is linear)

REPORT2: Please fill the below table in the report

	Accuracy	Training time (in seconds)
RNN w/o pretrained embedding	Plz, fill this	Plz, fill this
RNN w/ pretrained embedding	Plz, fill this	Plz, fill this
CNN w/o pretrained embedding	Plz, fill this	Plz, fill this
CNN w/ pretrained embedding	Plz, fill this	Plz, fill this

Training behavior (20%)

Plot the training/testing objective, training/testing accuracy over time for the 4 model combinations (correspond to 4 rows in the above table). In other word, there should be 2*4=8 graphs in total, each of which contains two curves (training and testing).

REPORT3: RNN w/o pretrained embedding

training/testing objective over time
training/testing accuracy over time

REPORT4: RNN w/ pretrained embedding

training/testing objective over time
training/testing accuracy over time

REPORT5: CNN w/o pretrained embedding

training/testing objective over time
training/testing accuracy over time

REPORT6: CNN w/ pretrained embedding

training/testing objective over time
training/testing accuracy over time

Analysis of results (20%)

REPORT7: Discuss the complete set of experimental results, comparing the algorithms to each other.

REPORT8: Discuss your observations about the various algorithms, i.e., differences in how they performed, different parameters, what worked well and didn't, patterns/trends you observed across the set of experiments, etc.

REPORT9: Try to explain why certain algorithms or approaches behaved the way they did.

The software implementation (10%)

Add detailed descriptions about software implementation & data preprocessing, including:

REPORT10: A description of what you did to preprocess the dataset to make your implementations easier or more efficient.

REPORT11: A description of major data structures (if any); any programming tools or libraries that you used;

REPORT12: Strengths and weaknesses of your design, and any problems that your system encountered;

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
pics_io		pics_io
LSTM.ipynb		LSTM.ipynb
LSTM_AMAZON.pdf		LSTM_AMAZON.pdf
README.md		README.md
amazon.ipynb		amazon.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coding Assignment 2: Hackathon - Binary Sentiment Analysis

Introduction

Submission checklist

What to submit

Construct the dataset (10%)

Performance of deep neural network for classification (40%)

Training behavior (20%)

Analysis of results (20%)

The software implementation (10%)

About

Releases

Packages

Languages

Hamidraei23/LSTM

Folders and files

Latest commit

History

Repository files navigation

Coding Assignment 2: Hackathon - Binary Sentiment Analysis

Introduction

Submission checklist

What to submit

Construct the dataset (10%)

Performance of deep neural network for classification (40%)

Training behavior (20%)

Analysis of results (20%)

The software implementation (10%)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages