Skip to content

Udacity's Machine Learning Nanodegree Graded Project. Includes a binary classification neural network model implemented using pytorch as well as an AdaBoostClassifier ensemble model implemented using sklearn used to detect plagiarism. Feature engineering involved containment and longest common subsequence calculation for the text data.

Notifications You must be signed in to change notification settings

satyajitovelil/SageMaker-Plagiarism_Detection

Repository files navigation

Contains the following notebooks along with the necessary scripts:

Notebook 2: Feature Engineering

  • Clean and pre-process the text data.
  • Define features for comparing the similarity of an answer text and a source text, and extract similarity features.
  • Select "good" features, by analyzing the correlations between different features.
  • Create train/test .csv files that hold the relevant features and class labels for train/test data points.

Notebook 3: Train and Deploy Your Model in SageMaker

  • Upload your train/test feature data to S3.
  • Define a binary classification model and a training script.
  • Train your model and deploy it using SageMaker.
  • Evaluate your deployed classifier.

Notebook 3 has two approaches:

  1. Uses sklearn to create classification model
  2. Uses pytorch to create classification model

About

Udacity's Machine Learning Nanodegree Graded Project. Includes a binary classification neural network model implemented using pytorch as well as an AdaBoostClassifier ensemble model implemented using sklearn used to detect plagiarism. Feature engineering involved containment and longest common subsequence calculation for the text data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published