Let Scanpy read from Cell Ranger 3.0 outputs #334
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @falexwolf,
10X is releasing a new version of CellRanger that is changing the output format. This pull request makes Scanpy forward compatible with the new version. In particular, the following changes are made:
Updated
read_10x_h5
:read_10x_h5
as_read_legacy_10x_h5
;_read_v3_10x_h5
to read the new Cell Ranger output format;read_10x_h5
determines the version of HDF5 input by the presence of the matrix key, and wraps the above two functions. In addition, it takes agex_only
argument which filters out feature barcoding counts from the outcome object when it is True (default). Otherwise, the full matrix will be retained.feature_types
andgenome
were added into the outcome object as new attributes.Updated
read_10x_mtx
:read_10x_mtx
as_read_legacy_10x_mtx
;_read_v3_10x_mtx
to read the new Cell Ranger output format;read_10x_mtx
determines the version of matrix input by the presence of thegenes.tsv
file under the input directory, and wraps the above two functions. In addition, it takes agex_only
argument which filters out feature barcoding counts from the outcome object when it isTrue
(default). Otherwise, the full matrix will be retained.feature_types
was added into the outcome object as a new attribute.Added small test datasets and code for the revised functions to verify the expected behavior.
Note for the
genome
argument:read_10x_h5
function but not inread_10x_mtx
as the genome was already specified by the path of input directory. The outcome object of the two functions should be the same which always take one genome at a time.read_10x_mtx
always read them all, whereasread_10x_h5
always need to specify one of them (mm10 by default). However, whengex_only == False
, thegenome
argument will be ignored and the whole matrix will be read.