Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Create very simple export for testing purposes #12

Open
lintool opened this issue Mar 9, 2020 · 5 comments
Open

Create very simple export for testing purposes #12

lintool opened this issue Mar 9, 2020 · 5 comments

Comments

@lintool
Copy link
Member

lintool commented Mar 9, 2020

@JMMackenzie and @chriskamphuis have requested a sample export for testing purposes.

I propose exporting the index from this Anserini test case: https://github.com/castorini/anserini/blob/master/src/test/java/io/anserini/integration/TrecEndToEndTest.java

which indexes this 3 document toy collection: https://github.com/castorini/anserini/tree/master/src/test/resources/sample_docs/trec/collection2

sg?

@chriskamphuis
Copy link
Member

sounds good

@JMMackenzie
Copy link
Member

Perfect!

@lintool
Copy link
Member Author

lintool commented Mar 9, 2020

toy-complete-20200309.ciff.gz

Reading header...
=== Header === 
version: 1
num_postings_lists: 9
num_doc_records: 3
total_postings_lists: 9
total_docs: 3
total_terms_in_collection: 16
average_doclength: 5.333333
description: Export of toy 3-document collection from Anserini's io.anserini.integration.TrecEndToEndTest test case

Expecting 9 postings lists and 3 doc records in this export.
term: '01', df=1, cf=1 (0, 1)
term: '03', df=1, cf=1 (0, 1)
term: '30', df=1, cf=1 (0, 1)
term: 'content', df=1, cf=1 (0, 1)
term: 'enough', df=1, cf=1 (2, 1)
term: 'head', df=3, cf=3 (0, 1) (1, 1) (1, 1)
term: 'simpl', df=2, cf=2 (1, 1) (1, 1)
term: 'text', df=3, cf=5 (0, 1) (1, 1) (1, 3)
term: 'veri', df=1, cf=1 (1, 1)
0	WSJ_1	6
1	TREC_DOC_1	4
2	DOC222	6

@lintool
Copy link
Member Author

lintool commented Mar 10, 2020

TODO: encode above as a test case.

@cmacdonald
Copy link
Member

might be nice to have another file that demonstrates the "Query terms only" case, i.e. num_postings_lists < total_postings_lists, and other relevant statistics

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants