This is a workflow for de novo transcriptome assembly with Illumina reads. It
-
Trims reads with
Trimmomatic
-
Performs digital normalization with
khmer
-
Assembles with
trinity
Just follow what is inside the .travis.yml
-
Install
conda
-
Clone this repo
-
Add your samples to
config.yaml
-
Run snakemake:
snakemake --use-conda -j
The hierarchy of the folder is the one described in A Quick Guide to Organizing Computational Biology Projects:
smsk_khmer_trinity
├── bin: binaries, scripts and environment files.
├── data: raw data, hopefully links to backup data.
├── README.md - This
├── results: processed data.
| ├── raw: links to raw data
| ├── qc: processed reads with trimmomatic
| ├── diginorm: digital normalization
| ├── assembly: Trinity output
| ├── filtering: TPM per loci filtering
| ├── tissue: per sample quantification
| └── transrate: assembly and filtering statistics
└── src: additional source code, tarballs, etc.
"Just" edit the config.yaml
with the paths to your fastq files and change parameters. In the section diginorm_params
\ max_table_size
type 4e9
because it's anoyingly slow to do tests with 16Gb of RAM.
Also raise Trinity's maximum memory usage if you need it.