Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Let Scanpy read from Cell Ranger 3.0 outputs #334

Merged
merged 1 commit into from
Oct 29, 2018

Conversation

qianggong11
Copy link
Contributor

@qianggong11 qianggong11 commented Oct 29, 2018

Hi @falexwolf,

10X is releasing a new version of CellRanger that is changing the output format. This pull request makes Scanpy forward compatible with the new version. In particular, the following changes are made:

Updated read_10x_h5:

  • Renamed the original read_10x_h5 as _read_legacy_10x_h5;
  • Added _read_v3_10x_h5 to read the new Cell Ranger output format;
  • The new read_10x_h5 determines the version of HDF5 input by the presence of the matrix key, and wraps the above two functions. In addition, it takes a gex_only argument which filters out feature barcoding counts from the outcome object when it is True (default). Otherwise, the full matrix will be retained.
  • For CR-v3, feature_types and genome were added into the outcome object as new attributes.

Updated read_10x_mtx:

  • Renamed the original read_10x_mtx as _read_legacy_10x_mtx;
  • Added _read_v3_10x_mtx to read the new Cell Ranger output format;
  • The new read_10x_mtx determines the version of matrix input by the presence of the genes.tsv file under the input directory, and wraps the above two functions. In addition, it takes a gex_only argument which filters out feature barcoding counts from the outcome object when it is True (default). Otherwise, the full matrix will be retained.
  • For CR-v3, feature_types was added into the outcome object as a new attribute.

Added small test datasets and code for the revised functions to verify the expected behavior.

Note for the genome argument:

  • There is a genome argument in Scanpy's read_10x_h5 function but not in read_10x_mtx as the genome was already specified by the path of input directory. The outcome object of the two functions should be the same which always take one genome at a time.
  • In this PR, when there are multiple genomes (e.g. Barnyard), read_10x_mtx always read them all, whereas read_10x_h5 always need to specify one of them (mm10 by default). However, when gex_only == False, the genome argument will be ignored and the whole matrix will be read.

Updated read_10x_h5:
- Renamed the original `read_10x_h5` as `_read_legacy_10x_h5`;
- Added `_read_v3_10x_h5` to read the new Cell Ranger output format;
- The new `read_10x_h5` determines the version of HDF5 input by the presence of the matrix key, and wraps the above two functions. In addition, it takes a `gex_only` argument which filters out feature barcoding counts from the outcome object when it is True (default). Otherwise, the full matrix will be retained.
- For CR-v3, `feature_types` and `genome` were added into the outcome object as new attributes.

Updated read_10x_mtx:
- Renamed the original `read_10x_mtx` as `_read_legacy_10x_mtx`;
- Added `_read_v3_10x_mtx` to read the new Cell Ranger output format;
- The new `read_10x_mtx` determines the version of matrix input by the presence of the `genes.tsv` file under the input directory, and wraps the above two functions. In addition, it takes a `gex_only` argument which filters out feature barcoding counts from the outcome object when it is `True` (default). Otherwise, the full matrix will be retained.
- For CR-v3, `feature_types` was added into the outcome object as a new attribute.

Added test data and code for the revised functions.

Note for the genome argument:
- There is a genome argument in Scanpy's `read_10x_h5` function but not in `read_10x_mtx` as the genome was already specified by the path of input directory. The outcome object of the two functions should be the same which always take one genome at a time.
- In this PR, when there are multiple genomes (e.g. Barnyard), `read_10x_mtx` always read them all, whereas `read_10x_h5` always need to specify one of them (mm10 by default). However, when `gex_only == False`, the `genome` argument will be ignored and the whole matrix will be read.
@falexwolf
Copy link
Member

Thank you very much! That looks good!

Alex

@falexwolf falexwolf merged commit 1284599 into scverse:master Oct 29, 2018
@flying-sheep flying-sheep changed the title Let Scanpy read from Cell Ranger 3.0 outputs (#1) Let Scanpy read from Cell Ranger 3.0 outputs Apr 11, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants