Skip to content

add steinbock #13

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 6 commits into from
Closed

add steinbock #13

wants to merge 6 commits into from

Conversation

giovp
Copy link
Member

@giovp giovp commented Jan 14, 2023

To test this out is a bit convoluted, but basically:

This is how it looks:
image

pinging @jwindhager and @nilseling in case you could provide any feedback, would be much appreciated.
Right now I'm importing:

  • images
  • labels (either from deepcell or ilastiks)
  • anndata

you guys did a fantastic work with providing in the h5ad all the info also presents in the rest of the csv files. Am I missing anything particularly important that you think it should be included in the in-memory object?

on a separate note: I noticed that there are also coordinates such as

image_acquisition_start_x_um image_acquisition_start_y_um image_acquisition_end_x_um image_acquisition_end_y_um image_acquisition_width_um image_acquisition_height_um

Could this info be used to recover the "global" coordinate space of the image in the biopsy? Or what does this info refers to?

Thanks in advance for any feedback!

Base automatically changed from io/nanostring to main January 16, 2023 13:26
@jwindhager
Copy link

jwindhager commented Jan 16, 2023

Hi @giovp, thanks so much for working on this!

A couple of comments from my side:

  • Steinbock is a toolkit and not a pre-defined pipeline/workflow, so the individual commands for preprocessing, segmentation etc. are independent and should therefore be considered "optional". This is especially true for the anndata folder generated by the steinbock export anndata command, since data export is not part of most steinbock-based pipelines I've seen in the wild.
  • In principle, the individual steinbock commands can operate on arbitrary input/output directories. While the file types are specified and the steinbock commands assume a default directory structure, the latter can be customized using command line options. This, again, is especially true for the anndata folder, since the steinbock export anndata command requires a user-defined destination name (there is no default directory structure for export commands).
  • In addition to the input/output directory names, also the contents of the anndata objects are determined by the user. Please have a look at this example, which showcases how such objects could be constructed by the user. Of note, in this example, all the individual images are concatenated, resulting in a single objects.h5ad output file.

Taken together, I see two options on how to deal with this flexibility (these are not mutually exclusive!):

  1. Clearly specify somewhere the expected directory names & contents (currently hardcoded in SteinbockKeys)
  2. Let the user specify/override these default directory names somewhere, e.g. in steinbock reader function arguments

[...] Am I missing anything particularly important that you think it should be included in the in-memory object?

No, I think images, labels (masks), tabular data (intensities, regionprops) and neighbors should suffice.

[...] Could this info be used to recover the "global" coordinate space of the image in the biopsy? Or what does this info refers to?

This data is extracted from MCD raw data during conversion to TIFF using readimc, see here. However, please bear in mind that such a file will only be generated when working with IMC raw data. For other multiplexed imaging technologies, where users directly start from TIFF, this information will not be available out of the box. But yes, one could recover the instrument's coordinate space using this information, as done e.g. by the napari-hierarchical plugin.

@giovp
Copy link
Member Author

giovp commented Jan 20, 2023

hi @jwindhager ,

thanks a lot for prompt reply.

Steinbock is a toolkit and not a pre-defined pipeline/workflow, so the individual commands for preprocessing, segmentation etc. are independent and should therefore be considered "optional". This is especially true for the anndata folder generated by the steinbock export anndata command, since data export is not part of most steinbock-based pipelines I've seen in the wild.

understand this, yet for making use of the data in spatialdata at all at least region and tables should be specified. Do you think it'd be fine to make those as required and the rest optional?

In principle, the individual steinbock commands can operate on arbitrary input/output directories. While the file types are specified and the steinbock commands assume a default directory structure, the latter can be customized using command line options. This, again, is especially true for the anndata folder, since the steinbock export anndata command requires a user-defined destination name (there is no default directory structure for export commands).

got it, would you suggest than that the user could also supply absolute paths instead of just the "dataset_id" ?

In addition to the input/output directory names, also the contents of the anndata objects are determined by the user. Please have a look at this example, which showcases how such objects could be constructed by the user. Of note, in this example, all the individual images are concatenated, resulting in a single objects.h5ad output file.

makes sense thanks, I believe this is what it is done also now (reading a single h5ad)

Clearly specify somewhere the expected directory names & contents (currently hardcoded in SteinbockKeys)

so motivation for hardcoding file names was reducing verbosity. From taking a look at the example directory, it seemed like the "dataset_id" was the experimental key used across files. I could see the option though to pass explicit paths (at least to the image, segmentation and table folders) as well.

This data is extracted from MCD raw data during conversion to TIFF using readimc, see here. However, please bear in mind that such a file will only be generated when working with IMC raw data. For other multiplexed imaging technologies, where users directly start from TIFF, this information will not be available out of the box. But yes, one could recover the instrument's coordinate space using this information, as done e.g. by the napari-hierarchical plugin.

thanks this is very useful, will take a look.

@giovp
Copy link
Member Author

giovp commented Mar 4, 2023

merged in #19

@giovp giovp closed this Mar 4, 2023
@giovp
Copy link
Member Author

giovp commented Mar 4, 2023

think we'll have to revisit anyway once user start using it

@giovp giovp deleted the io/steinbock branch March 4, 2023 20:37
@LukasHats
Copy link

LukasHats commented Jun 25, 2024

think we'll have to revisit anyway once user start using it

Seems like I am one of the first to test it in the wild. As @jwindhager explained, steinbock offers a flexible way of working with your steinbock working directory. I tried using spatialdata_io.steinbock on my directory. However, I experienced some issues, as I did not export the comprehensive anndata Object, as this is not a mandatory step when processing IMC data with Steinbock.
This is not a huge issue, as I can go back and simply do this. However, I would suggest optimizing the documentation, e.g. stating the mandatory outputs of Steinbock including naming (e.g. the reader needs the anndata to be named cells.h5ad). Also, Steinbock users might not export OME tiffs, as the standard steinbock output from .mcd files is tiffs.

@LucaMarconato
Copy link
Member

Thank you @LukasHats for reporting on this and for the explanation. Would you be up for making a small PR to address these issues?

@LukasHats
Copy link

@LucaMarconato Happy to do so once I make the reader work!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants