Skip to content

A simple formulation and its implementation to get the best top k documents given a query, considering precision and diversity as variables.

License

Notifications You must be signed in to change notification settings

thiagovas/Diversity-L2R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diversity-L2R

A simple formulation and its implementation to get the best top k documents given a query, considering precision and diversity as variables.

Given a set of documents D and a set of queries Q the goal of Learning to Rank (L2R) is to learn a model that ranks D and any other documents, given any other query. In the "classic" version, we're just concerned with precision. But here we added another variable, diversity.

So, we try to optimize the F score between precision and diversity. Diversity defined as the number of different types found among the top k ranked documents.

The precision of each document can be forecasted using any L2R model; one can use methods like Random Forest, SVM, LambdaMART, etc. And the types of each document can be defined using any method you want.


Expected Input

Three integers: n, m and k.
n representing the number of documents.
m representing the number of types of documents.
k representing the number of documents that will be selected.

n lines follows:
For each line there is a real number p, the precision of the i-th document.

Another n lines follows:
For each line there is an integer x, the number of types assigned to the i-th document.
The integer x is followed by x other integers, the types of the document.

Everything here is 0-based.

Input Example:
5 3 2
0.1
0.5
0.7
0.9
0.3
1 1
3 0 1 2
2 1 2
0
1 0

About

A simple formulation and its implementation to get the best top k documents given a query, considering precision and diversity as variables.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published