Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

gex_only=True defaults in sc.read_10x_h5 #1949

Open
LuckyMD opened this issue Jul 13, 2021 · 3 comments
Open

gex_only=True defaults in sc.read_10x_h5 #1949

LuckyMD opened this issue Jul 13, 2021 · 3 comments

Comments

@LuckyMD
Copy link
Contributor

LuckyMD commented Jul 13, 2021

This might be a bit of a rant, and I'm aware there are some good arguments for the way things are... but I just wasted 4 hours of my life because I wasn't aware of the default gex_only=True in sc.read_10x_h5().

Just wanted to flag that if scanpy support for multimodality becomes a thing, then this default may need to change to prevent frustration. I think moving read/write functionality to AnnData was already discussed at some point. For multimodal support, this might become more important again. If this is moved, then read_10x_h5 should probably not default to gex_only=True anymore.

Would it already be worth either making gex_only a required input? For backward compatability it could also just trigger a logging warning for now. I do think that this is quite an important thing to let people know with more and more 10X Multiome data being generated now (and CITE-seq for that matter).

@chris-rands
Copy link
Contributor

chris-rands commented Jul 13, 2021

I've ran into this before (sc.read_10x_mtx() has the same default of course). One possible issue with defaulting to gex_only=False is that someone might accidentally run a 'regular pipeline' with multi-modal data, e.g. log-normalizing RNA+protein+cell hashing counts together without first subsetting the adata based on .var["feature_types"]. By contrast, anyone who know they have multi-modal data would hopefully notice the missing the data with gex_only=True. Either way, logging warnings sounds good.

@ivirshup
Copy link
Member

ivirshup commented Jul 14, 2021

For context, the option was added in #334, and I think the scope for other feature types was much more limited at the time.

Would it already be worth either making gex_only a required input?

I'm not sure the gex_only argument even entirely makes sense anymore. I think a feature_type argument would make more sense. Erroring if nothing is passed and there are multiple kinds sounds reasonable to me, as multimodality should be handled explicitly.

For backwards compatibility I think deprecation warnings for a release cycle when either gex_only is used or nothing is passed and there are multiple feature types present could work.


Moving the 10x reading functions had been discussed in: #1387

@LuckyMD
Copy link
Contributor Author

LuckyMD commented Jul 14, 2021

Thanks, @ivirshup... I 100% agree with your suggestions. Maybe it's already worth adding a warning to the read functions now in cases where feature_type has multiple values.

@chris-rands agreed that we shouldn't set the default to False. This will have multiple issues both for backward compatability and the cases you describe.

@DaneseAnna: this might be important for the new episcanpy function i proposed yesterday ;).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants