Download w2a.zip
or the folder asg2
The required code-snippets for working on the Assignment 2 can be found at Assignment 2 work-sheet.ipynb
.
- For questions 5,6,7 use the function
levenshtein
- For question 6, modify the function
levenshtein
on the variablesubstitutions
- For question 8, use the function
jaro_winkler
. The function is defined in the fileEdistance.py
- For questions 5 to 10, the function
uniFreq
is needed to calculate the count of unigrams in the corpus C3 - For question 9, the function
bigramFreq
is needed to calculate the count of bigrams in the corpus C3 - For question 10, use the code snippet given in the last cell
- use unigram.csv for questions 5,6,7,8
- use bigrams.csv for questions 9,10