-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Document gap compression #19
Comments
Indeed, this is not clear from the protobuf definition. |
This is a slightly odd one, because the gap compression only arises due to the way the Lucene export is engineered. So I guess are we going to assume that any other system which may want to export a CIFF should also be doing delta compression? In that case, we should definitely document it with the CIFF/protobuf definition. On the other hand, there's nothing inherently in the definition of the protobuf which makes it necessary to store deltas. Thoughts? |
I think only the description should be updated. If systems are allowed to also export without storing delta's, a system has to know how the CIFF is constructed before reading it. It would be desirable to be consistent on how CIFF should be constructed given an index. |
Jimmy's implementation of the Lucene index export adds in the delta gap (this isnt related to the Lucene index itself). Assuming its the defacto base, then readers and writers have to be aware of d-gaps. All of our impls now have d-gaps. Arguably the name "docid" in the Posting object definition is what is wrong - if we were always going to use d-gaps, the name should have been different. As suggested in the OP, its documentation changes that are needed. |
I've started a branch to work on some improved documentation: https://github.com/osirrc/ciff/tree/documentation Please feel free to contribute. |
Changes made. |
We need to explicitly document that docids are gap compressed, both in README and in the protobuf definition (i.e., in comments).
The text was updated successfully, but these errors were encountered: