Diversity-L2R

A simple formulation and its implementation to get the best top k documents given a query, considering precision and diversity as variables.

Given a set of documents D and a set of queries Q the goal of Learning to Rank (L2R) is to learn a model that ranks D and any other documents, given any other query. In the "classic" version, we're just concerned with precision. But here we added another variable, diversity.

So, we try to optimize the F score between precision and diversity. Diversity defined as the number of different types found among the top k ranked documents.

The precision of each document can be forecasted using any L2R model; one can use methods like Random Forest, SVM, LambdaMART, etc. And the types of each document can be defined using any method you want.

Expected Input

Three integers: n, m and k.
n representing the number of documents.
m representing the number of types of documents.
k representing the number of documents that will be selected.

n lines follows:
For each line there is a real number p, the precision of the i-th document.

Another n lines follows:
For each line there is an integer x, the number of types assigned to the i-th document.
The integer x is followed by x other integers, the types of the document.

Everything here is 0-based.

Input Example:
5 3 2
0.1
0.5
0.7
0.9
0.3
1 1
3 0 1 2
2 1 2
0
1 0

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
doc		doc
src		src
tst		tst
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diversity-L2R

Expected Input

About

Releases

Packages

Languages

License

thiagovas/Diversity-L2R

Folders and files

Latest commit

History

Repository files navigation

Diversity-L2R

Expected Input

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages