Skip to content

Update a single row #1545

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
benjeffery opened this issue Jun 30, 2021 Discussed in #1543 · 8 comments · Fixed by #1600
Closed

Update a single row #1545

benjeffery opened this issue Jun 30, 2021 Discussed in #1543 · 8 comments · Fixed by #1600
Labels
Python API Issue is about the Python API

Comments

@benjeffery
Copy link
Member

Discussed in #1543

Originally posted by hyanwong June 30, 2021
At the moment I think adding to the metadata arrays (e.g. on populations, individuals, or nodes) is quite intricate, and I have to look it up every time. I think it's something like this:

ts = msprime.sim_ancestry(10)
tables = ts.dump_tables()
pop_metadatas = [p.metadata for p in ts.populations()]  # Why is there no tables.population.get_metadatas() method?
pop_metadatas[0]['name'] += "(the only population)"
tables.populations.packset_metadata(
    [tables.populations.metadata_schema.validate_and_encode_row(r) for r in pop_metadatas])
new_ts = tables.tree_sequence()
print([p for p in new_ts.populations()])

(it looks me about 10 minutes to look up how to do that, and the information is scattered all over the docs - I don't think a beginner could do it TBH).

Should we have some convenience methods to help with this? I quite often want to edit metadata on just a single entry in a table: e.g. label a particular individual, flag up a single node, etc.

We should also give an example of doing this in the metadata tutorial

@benjeffery benjeffery added C API Issue is about the C API Python API Issue is about the Python API labels Jun 30, 2021
@benjeffery benjeffery added this to the C API 1.0.0 milestone Jun 30, 2021
@benjeffery
Copy link
Member Author

I've added this to the C 1.0 milestone as we'll need something like: tsk_update_char_ragged_array(tsk_size_t *offset_to_modify, char *array_to_modify, tsk_size_t *row_indexes, tsk_size_t *new_offset, char *new_array)

@jeromekelleher
Copy link
Member

I'm not sure this is something we want in the public API - wouldn't it be better to support updating a single row? If people are going to be updating multiple rows, then they should work column wise anyway.

@benjeffery
Copy link
Member Author

Gah, of course, forgetting the C milestone is for the public API. Was thinking this would just be used by the CPython code.

@jeromekelleher
Copy link
Member

Mind if we generalise this to "update single row" @benjeffery? No point in just focusing on the ragged columns here, we want an x_table_set_row as discussed here

@benjeffery benjeffery changed the title Set ragged array contents on a single row Update a single row Jul 5, 2021
@jeromekelleher
Copy link
Member

Removing C API tag as it's done there.

@jeromekelleher jeromekelleher removed the C API Issue is about the C API label Jul 23, 2021
@benjeffery
Copy link
Member Author

benjeffery commented Jul 28, 2021

The proposed syntax here is tables.nodes[0] = tables.nodes[0].replace(flags=tskit.NODE_IS_SAMPLE) where the RHS can be any object that has the necessary attributes for the table on the LHS.

@jeromekelleher
Copy link
Member

Can this be closed now?

@benjeffery
Copy link
Member Author

Closed by #1600

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Python API Issue is about the Python API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants