-
Notifications
You must be signed in to change notification settings - Fork 76
Add the skip_reference_sequence and ignore_reference_sequence options #2019
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2019 +/- ##
===========================================
+ Coverage 81.56% 93.35% +11.79%
===========================================
Files 27 27
Lines 25404 25536 +132
Branches 1112 1112
===========================================
+ Hits 20720 23840 +3120
+ Misses 4619 1660 -2959
+ Partials 65 36 -29
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic, thanks @clwgg. I spotted a few minor issues.
Can you update the changelogs also please? (Should be a copy/paste from the ignore_tables entries)
c/tskit/tables.c
Outdated
int kas_flags = options & TSK_LOAD_SKIP_TABLES ? 0 : KAS_READ_ALL; | ||
int kas_flags = KAS_READ_ALL; | ||
if ((options & TSK_LOAD_SKIP_TABLES) | ||
| (options & TSK_LOAD_SKIP_REFERENCE_SEQUENCE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bitwise or I think, we want boolean or ||
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if you wanted to be fancy you could say options & (TSK_LOAD_SKIP_TABLES|TSK_LOAD_SKIP_REFERENCE_SEQUENCE)
, but I think I'd prefer the more obvious version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooops :shame:
@@ -2997,11 +2997,18 @@ def load(file, *, skip_tables=False): | |||
Please note that with this option set, it is not possible to load data from | |||
a stream of multiple tree sequences using consecutive calls to | |||
:meth:`tskit.load`. | |||
:param bool skip_reference_sequence: If True, the tree sequence is read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should say that the skip_x
options require a seekable stream, and they'll fail if you call them on a socket or stdin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I was wondering about that but wasn't sure how to fit it into the :param structure. should I just throw it in at the top-level before the :param blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throw it in before the :param: blocks yeah, maybe as a .. warning::
or something?
I agree there's an argument, but we've erred on the side of verbosity elsewhere so we might as well stick with what we have now, IMO. I'm happy to change if someone else feels strongly about it, though, I'm not bothered either way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though one doc string needed a little tweak.
c/tskit/tables.h
Outdated
If the TSK_LOAD_SKIP_TABLES option is set, only the top-level | ||
information of the table collection will be read, leaving all tables empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the TSK_LOAD_SKIP_TABLES option is set, only the top-level | |
information of the table collection will be read, leaving all tables empty. | |
If the TSK_LOAD_SKIP_TABLES option is set, only the non-table | |
information from the table collection will be read, leaving all tables with zero rows and no metadata or schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @benjeffery! since that's the skip_tables
docstring, should I open another PR for that? or are y'all ok with throwing that in with the skip_refseq
PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be in this PR as that text only needs changing because of the ref seq.
b983188
to
21bc675
Compare
|
Looks great, thanks @clwgg, let's merge! |
Description
Fixes #1971
Adds two new C flags:
TSK_LOAD_SKIP_REFERENCE_SEQUENCE
andTSK_CMP_IGNORE_REFERENCE_SEQUENCE
.These are exposed to python as the
skip_reference_sequence
flag toTableCollection.load
andTreeSequence.load
, as well as theignore_reference_sequence
flag toTableCollection.equals
andTreeSequence.equals
.Since the flags were written this way in the issue I kept them "written out", though I think an argument could be made to shorten
reference_sequence
torefseq
in all four cases (your call @jeromekelleher, @benjeffery).PR Checklist: