Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add tf & tf-idf cosine based similarity comparisons #438

Merged
merged 2 commits into from
Mar 13, 2019

Conversation

SailSlick
Copy link
Contributor

@SailSlick SailSlick commented Mar 12, 2019

connects to #373, #374

Add new similarity function based on methods described in Content-based Recommendation in Social Tagging Systems link

  • Add 2 new similarity metrics
  • Clean all output cells of jupyter notebook (had no idea you could do this before)
  • Clean tag input a bit

Cosine similarity

Averages from 3 runs:

n1   n2   n3   AUC
1933 	102  	3110 	0.426623
1923 	107  	3115 	0.436296
1958 	99   	3088 	0.431699

Cosine TF-IDF frequency

Averages from 3 runs:

n1   n2   n3   AUC
2047 	149  	2949 	0.455078
2023 	142  	2981 	0.449217
2032 	131  	2982 	0.451105

issues:

  • Most movie tags are not repeated often/ not that many tags.
    • Could be solved by using the larger movielens dataset (3,600 tag applications applied to 9,000 movies) vs (1,100,000 tag applications applied to 58,000 movies)

@SailSlick SailSlick requested a review from iandioch March 12, 2019 18:17
@SailSlick SailSlick mentioned this pull request Mar 13, 2019
Copy link
Member

@iandioch iandioch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AUC metric shouldn't give results below 0.5. I'm not sure why this metric guarantees this (it doesn't seem apparent in the basic AUC formula we're using), or why exactly here we're getting low results, but your work here seems mostly fine.

@SailSlick SailSlick merged commit d1b2772 into master Mar 13, 2019
@SailSlick SailSlick deleted the r/improve-recommend-post branch March 13, 2019 14:20
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants