Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Document gap compression #19

Open
lintool opened this issue Mar 16, 2020 · 6 comments
Open

Document gap compression #19

lintool opened this issue Mar 16, 2020 · 6 comments
Assignees

Comments

@lintool
Copy link
Member

lintool commented Mar 16, 2020

We need to explicitly document that docids are gap compressed, both in README and in the protobuf definition (i.e., in comments).

@lintool lintool self-assigned this Mar 16, 2020
@cmacdonald
Copy link
Member

Indeed, this is not clear from the protobuf definition.

@JMMackenzie
Copy link
Member

This is a slightly odd one, because the gap compression only arises due to the way the Lucene export is engineered. So I guess are we going to assume that any other system which may want to export a CIFF should also be doing delta compression? In that case, we should definitely document it with the CIFF/protobuf definition.

On the other hand, there's nothing inherently in the definition of the protobuf which makes it necessary to store deltas. Thoughts?

@chriskamphuis
Copy link
Member

I think only the description should be updated. If systems are allowed to also export without storing delta's, a system has to know how the CIFF is constructed before reading it. It would be desirable to be consistent on how CIFF should be constructed given an index.

@cmacdonald
Copy link
Member

Jimmy's implementation of the Lucene index export adds in the delta gap (this isnt related to the Lucene index itself). Assuming its the defacto base, then readers and writers have to be aware of d-gaps. All of our impls now have d-gaps.

Arguably the name "docid" in the Posting object definition is what is wrong - if we were always going to use d-gaps, the name should have been different. As suggested in the OP, its documentation changes that are needed.

@JMMackenzie
Copy link
Member

I've started a branch to work on some improved documentation: https://github.com/osirrc/ciff/tree/documentation

Please feel free to contribute.

@cmacdonald
Copy link
Member

Changes made.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants