-
Notifications
You must be signed in to change notification settings - Fork 76
Add the skip_reference_sequence and ignore_reference_sequence options #1971
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
To me, the reference sequence is in the same category as the table data, and we use the existing flags. Happy to hear of any use cases that need just the reference though. |
I agree - any better name |
As you say |
This seems fine to me. Of course, who knows what different functionality end users might request next :->. It might be that simply providing multiple flags – load this, don't load that – would provide flexibility for the future that might prove useful. But perhaps it makes sense to wait and see, rather than trying to put all that flexibility in at the outset (perhaps complicating the API unnecessarily). If the policy choice is that the reference sequence is "table data", then perhaps rather than renaming the flag away from I'm not sure I understand the objection to the flag name |
I agree with the above -- it doesn't feel like there is the need right now for a proliferation of flags to |
Right now, there's I think you're going to have the casting vote @bhaller - if you want to be able to load a tree sequence but skipping the reference sequence data like you mention here, then by far the simplest way to facilitate this is to have a |
Yes, if that flag is available I will use it (barring unforeseen snafus), and I think it will make a real difference to end users in terms of memory usage. Thanks for listening. :-> |
OK, sounds like a decision then? We add the |
@jeromekelleher |
@bhaller together with the |
I've changed the title of this issue accordingly. @clwgg is there any chance we could use your expertise here? 😄 |
sure, I'm happy to work on it! what is the approx. merge window time line for 0.4.0/1.0.0 at this point? (just to see if I can get it done in time) |
Were hoping about a week. |
The big update for references sequences is merged @clwgg, so the way is clear if you'd like to pick this one up! |
@clwgg it would be great if this happened soon; we're getting down to the wire on getting this stuff in before SLiM 3.7 needs to ship. Just FYI, if you are able to get to it. Thanks! |
There's no pressure on you to do this @clwgg, but could you let us know if you'll be able to get to it in the next day or two? @benjeffery or I would be happy to pick it up instead, as we're very keen to tag 0.4.0 (and a C 0.99 release) so that we can unblock some downstream stuff. |
yup, on it today! |
Just an update: the tskit_one_point_oh branch has been updated to tskit 0.99.15, and appropriate fixes have been put in to match the changes done on the tskit side with respect to reading/writing the reference sequence. Changes pushed to GitHub. Seems good so far; running the full test suite now, which takes several hours. |
We recently added the concept of "table data" in 82a56e7 with the addition of the
skip_tables
flag to tskit.load() and theignore_tables
flag toTableCollection.equals()
(and the corresponding flags to the C API). Since that change was made we also in parallel added basic support for reference sequence data. As @bhaller points out (#1854 (comment)) the skip_tables option loads the reference sequence data.The skip_tables option was initially motivated by the desire to get access to the top-level metadata only (#1854). Providing access only to the metadata is a non-starter I think, because it's much simpler to skip loading stuff into the table collection that it is to provide separate APIs for accessing the metadata. So, there will always be some extra info that comes with the metadata, and this is correct I think: what if I was going through a bunch of files just to read their
uuid
values? This isn't metadata, and I wouldn't want to read the whole file just to get them either.The question then is what we do from this point. Since we want the option of not loading reference sequence data, the options as I see it are:
skip_reference_sequence
andignore_reference_sequence
toload
andequals
reference_sequence
astable
data, and document as suchskip_tables
andignore_tables
flags to something liketop_level_only
and be clear that we don't considerreference_sequence
as top level data.Any thoughts @bhaller @clwgg?
The text was updated successfully, but these errors were encountered: