Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Adding attrs at the SpatialData object level #711

Merged
merged 13 commits into from
Dec 1, 2024

Conversation

quentinblampey
Copy link
Contributor

@quentinblampey quentinblampey commented Sep 23, 2024

As described in issue #404, it is useful to have attributes at the SpatialData object level. This will also be useful in Sopa, to simplify the API.

Example Usage

import spatialdata

from spatialdata.datasets import blobs

sdata = blobs()

sdata.attrs # empty dictionnary
sdata.attrs["test"] = 12

sdata.write("blobs.zarr") # this saves the attributes on disk

sdata = spatialdata.read_zarr("blobs.zarr") # the attributes are read normally

sdata.attrs["test"] # returns 12

What do you think @LucaMarconato?

Check-list

  • Add attrs attribute to the SpatialData object
  • Can write the attrs on disk when the data is not backed
  • Can write the attrs on disk when the data is already backed
  • Can read the attrs from disk
  • Pass the attrs to new SpatialData objects
  • Merge conflict keys in attrs during concatenation
  • Update the format version (see Luca's comment)

Copy link

codecov bot commented Sep 23, 2024

Codecov Report

Attention: Patch coverage is 88.60759% with 9 lines in your changes missing coverage. Please review.

Project coverage is 91.78%. Comparing base (2a0c23a) to head (4546bea).
Report is 70 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata/_core/spatialdata.py 92.00% 4 Missing ⚠️
src/spatialdata/_io/format.py 71.42% 4 Missing ⚠️
src/spatialdata/_io/io_zarr.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #711      +/-   ##
==========================================
- Coverage   91.80%   91.78%   -0.03%     
==========================================
  Files          45       45              
  Lines        6959     7045      +86     
==========================================
+ Hits         6389     6466      +77     
- Misses        570      579       +9     
Files with missing lines Coverage Δ
src/spatialdata/_core/_deepcopy.py 98.38% <100.00%> (+0.02%) ⬆️
src/spatialdata/_core/concatenate.py 91.80% <100.00%> (+0.20%) ⬆️
src/spatialdata/_core/operations/_utils.py 92.15% <100.00%> (ø)
src/spatialdata/_core/operations/rasterize.py 90.45% <100.00%> (ø)
src/spatialdata/_core/operations/transform.py 90.82% <100.00%> (ø)
src/spatialdata/_core/query/spatial_query.py 95.50% <100.00%> (ø)
src/spatialdata/_io/io_zarr.py 87.77% <75.00%> (-0.60%) ⬇️
src/spatialdata/_core/spatialdata.py 90.87% <92.00%> (-0.01%) ⬇️
src/spatialdata/_io/format.py 83.22% <71.42%> (-1.13%) ⬇️

... and 3 files with indirect coverage changes

@LucaMarconato LucaMarconato self-assigned this Sep 24, 2024
@LucaMarconato
Copy link
Member

LucaMarconato commented Sep 26, 2024

Thanks for the PR.

  • before merging we need to update the format version; we probably should add a format for the SpatialData container; something asked also by several users (we already have a format for the single elements, which would be independent from the format of the container).

@quentinblampey
Copy link
Contributor Author

Hi @LucaMarconato, have you started working on your last comment? If not, do you want me to have a look (although I'm not exactly sure to understand what you want to do)?

@quentinblampey
Copy link
Contributor Author

I noticed we are missing a feature: after we save a SpatialData object, if we add a new attr, there is currently no way to save it on disk. Something like sdata.write_element for the attrs

@Marius1311
Copy link

Just a random note on this: I agree that having SpatialData - level attributes would be super helpful. Beyond that, I think it would be great to be able to annotate coordinate systems - these often correspond to samples, conditions, time-points etc. I can of course annotate all elements within a coordinate system through tables, but I think it would be more direct to somehow directly provide metadata for an entire coordinate system.

@quentinblampey
Copy link
Contributor Author

quentinblampey commented Oct 31, 2024

I also noticed that we want to pass the attrs inside certain functions, e.g. when performing a query (the attrs should be kept). I made a commit for this

The only concern is when we concatenate multiple SpatialData objects, how should we merge the duplicate keys (if any)?

NB: I updated the first post with a check-list

@LucaMarconato
Copy link
Member

@quentinblampey thanks for the update. I haven't updated the format yet, we can work on it this week at the hackathon. Regarding saving to disk, I suggest to extend the write_metadata() method. Finally, regarding .attrs, we could proceed as AnnData does for .uns when calling ad.concat().

@timtreis
Copy link
Member

Extremely interested in this, thanks @quentinblampey !!

@LucaMarconato
Copy link
Member

LucaMarconato commented Dec 1, 2024

@quentinblampey I have finalized this PR, I'm going to merge it now. Thanks for your contribution!

Two comments.

  1. root-level versioning: On disk, I am now adding a version number for the root level metadata, and also I am adding the string of the spatialdata software version that was used to create the metadata (this helps reproducibility). When reading the .zattrs from disk into the SpatialData.attrs slow I am not parsing the two version strings not to clutter the object.
  2. .attrs passed by reference: I have removed the code dict(sdata.attrs) and passing the attrs always by reference. The only exceptions are when I call spatialdata.deepcopy(), where I call copy.deepcopy(), and when the user passes a Mapping that is not a dict. In such case the casting to dict performs a shallow copy.

@LucaMarconato LucaMarconato enabled auto-merge (squash) December 1, 2024 13:21
@LucaMarconato LucaMarconato merged commit 02396ea into scverse:main Dec 1, 2024
7 of 8 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

On disk versioning feedback
4 participants