A pre-processing pipeline companion to https://github.com/chris-mcginnis-ucsf/MULTI-seq. This collection of scripts is meant to replace 2 pre-processing functions in the R package deMultiplex. No dependancies on external packages.
makeReadtable.py
is analogous toMULTIseq.preProcess
from deMultiplex but atleast 10x faster and has minimal RAM footprint.
makeBarMatrix.py
is analogous toMULTIseq.align
from deMultiplex but atleast 10x faster and uses an internal hamming distance only if there is no sample barcode match accepts 1 nucleotide mismatch in the sample barcode.
The makeReadtable.py
requires 4 arguments.
- The
-C
should be provided a text file of cell id barcodes with single cell barcode per line. - The
-R1
is the multi-seq fraction Fastq pair 1. (can be compressed in gz format). - The
-R2
is the multi-seq fraction Fastq pair 2. (can be compressed in gz format). - The
-O
is the output filename for the read table.
A typical command would be
usage makeReadtable.py -C cellIds.txt -R1 R1.fastq.gz -R2 R2.fastq.gz -O readTable.csv
.
The makeBarMatrix.py
requires 4 arguments.
- The
-C
should be provided a text file of cell id barcodes with single cell barcode per line. - The
-R
is the readtable output generated bymakeReadtable.py
. - The
-B
is the multi-seq barcodes that are needed. - The
-O
is the output filename for the barmatrix file.
A typical command would be
usage makeBarMatrix.py -C cellIds.txt -R readTable.csv -B data/multiSeqBarcodes_1_to_32.txt -O barMatrix.csv
.