Skip to content

Latest commit

 

History

History
64 lines (58 loc) · 2.74 KB

README.md

File metadata and controls

64 lines (58 loc) · 2.74 KB

pyCollaborativeFiltering

User-based and Item-based Collaborative Filtering algorithms written in Python

Develop enviroment

  • Language: Python3
  • IDE: Eclipse PyDev
  • Prerequisite libraries: Numpy

Specification of user-based method

  • If you use a built-up model, the recommender system considers only the nearest neighbors existing in the model. Otherwise, the recommender looks for K-similar neighbors for each target user by using the given similarity measure and the number(K) of nearest neighbors.
  • In unary data, the predicted score of the item is the average similarity of the nearest neighbors who rated on the item.
  • User similarity does not include those of neighbors whose similarity is zero or lower value.
  • The cosine similarity basically considers only co-rated items. (Another measures such as the basic cosine similarity and Pearson correlation coefficient are also applicable.)

Input data format

UserID \t ItemID \t Rating \n

Usage example

User-based Recommendation

>>> import tool
>>> data = tool.loadData("/home/changuk/data/MovieLens/movielens.dat")
>>> from recommender import UserBased
>>> ubcf = UserBased()
>>> ubcf.loadData(data)
>>> import similarity
>>> simMeasure = similarity.cosine_intersection
>>> for user in data.keys():
...     recommendation = ubcf.Recommendation(user, simMeasure=simMeasure, nNeighbors=30)

Item-based Recommendation

>>> import tool
>>> data = tool.loadData("/home/changuk/data/MovieLens/movielens.dat")
>>> from recommender import ItemBased
>>> ibcf = ItemBased()
>>> ibcf.loadData(data)
>>> model = ibcf.buildModel(nNeighbors=20)
>>> for user in data.keys():
...     recommendation = ibcf.Recommendation(user, model=model)

Validation

>>> import tool
>>> trainSet = tool.loadData("/home/changuk/data/MovieLens/u1.base")
>>> testSet = tool.loadData("/home/changuk/data/MovieLens/u1.test")
>>> from recommender import UserBased
>>> ubcf = UserBased()
>>> ubcf.loadData(trainSet)
>>> model = ubcf.buildModel(nNeighbors=30)
>>> import validation
>>> result = validation.evaluateRecommender(testSet, ubcf, model=model, topN=10)
>>> print(result)
{'Precision': 0.050980392156862, 'Recall': 0.009698538130460, 'Hit-rate': 0.5098039215686}

TODO list

  • Support binary data
  • Implement similarity normalization in Item-based CF

References