GitHub - satyajitovelil/SageMaker-Plagiarism_Detection: Udacity's Machine Learning Nanodegree Graded Project. Includes a binary classification neural network model implemented using pytorch as well as an AdaBoostClassifier ensemble model implemented using sklearn used to detect plagiarism. Feature engineering involved containment and longest common subsequence calculation for the text data.

Notebook 2: Feature Engineering

Clean and pre-process the text data.
Define features for comparing the similarity of an answer text and a source text, and extract similarity features.
Select "good" features, by analyzing the correlations between different features.
Create train/test .csv files that hold the relevant features and class labels for train/test data points.

Notebook 3: Train and Deploy Your Model in SageMaker

Notebook 3 has two approaches:

Uses sklearn to create classification model

Uses pytorch to create classification model

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
notebook_ims		notebook_ims
plagiarism_data		plagiarism_data
source_pytorch		source_pytorch
source_sklearn		source_sklearn
.gitignore		.gitignore
2_Plagiarism_Feature_Engineering.ipynb		2_Plagiarism_Feature_Engineering.ipynb
3_Training_a_Model_PyTorch.ipynb		3_Training_a_Model_PyTorch.ipynb
3_Training_a_Model_sklearn.ipynb		3_Training_a_Model_sklearn.ipynb
README.md		README.md
helpers.py		helpers.py
problem_unittests.py		problem_unittests.py

Provide feedback