YeastStrainsStudy

Scripts to download the PacBio, ONT and MiSeq datasets used in https://www.nature.com/articles/s41598-017-03996-z and run the pipelines as described in the paper or simply download the final assemblies as generated by the authors.

Instructions

Download repository:

git clone https://github.com/fg6/YeastStrainsStudy.git

Usage:

$ ./launchme.sh <command> <strain>
  command: command to be run. Options: install, download, check, deepcheck, clean, nanoclean, 
  							             finalfastas, findassembly 
  strain: Download data for this strain/s, only for command=download, check or deepcheck
	  Options: s288c,sk1,cbs,n44,all [s288c]

Download data and utilities

With the script launchme.sh you can download the whole datasets used in the analysis of the paper https://www.nature.com/articles/s41598-017-03996-z to run the pipelines yourself, or download only the final assemblies generated by the authors of the paper.

!!! Warning !!!: due to a recent protocol change in the EBI database this scripts fails to export 
	 MiSeq cram files to fastqs. If you are experiencing this problem please use scramble 
	 (https://www.biorxiv.org/content/early/2014/03/28/003640) to export to fastqs, 
	 or download the fastq files directly from ENA.

Download only the final assemblies

To just look at the assemblies generated by the pipelines:

Step 1. Download the assemblies:

$ ./launchme.sh finalfastas

Step 2. List the assemblies selecting strain, assembler and or platform:

$ ./launchme.sh findassembly

!!!!!   Warning  !!!!! 
This script is interactive: It will ask you which strain, assembler or platform you want to focus on

Download all the data to run the pipelines:

Step 1. Download and install needed codes and scripts:

$ ./launchme.sh install

Step 2. Download data and prepare the fastq files:

$ ./launchme.sh download <strain> 

strain= s288c, sk1, n44, cbs or all  [s288c]

Step 3. Once the data have been downloaded and the fastq files prepared, check the fastq files:

$ ./launchme.sh check <strain> 

    strain= s288c, sk1, n44, cbs or all  [s288c]

If the check give you warnings, probably some file failed to download properly, 
follow the instructions given in the output
If the instructions do not help, try with 

$ ./launchme.sh deepcheck <strain>

Step 4/A. If everything looks ok and there are no warnings from Step 3, you can clean up the data folders, deleting every intermediate files and folders:

    $ ./launchme.sh clean <strain>

!!!!!   Warning  !!!!! 
1. Please run this only after Step 3 and only if Step 3 showed no errors or warnings, 
	otherwise you will have to download everything again!
2. Please do not run this if you intend to run Nanopolish, 
        as Nanopolish needs the s288c fast5 files, run instead Step 4/B

Step 4/B. If everything looks ok and there are no warnings, you can clean up the data folders, deleting every intermediate files and folders not needed by Nanopolish:

    $ ./launchme.sh nanoclean <strain>

    !!!!!   Warning  !!!!!
    Please run this only after Step 3 and only if Step 3 showed no errors or warnings,
      otherwise you will have to download everything again!

Disk space required:

If not cleaning up: 1.7TB

After cleaning all (clean): < 30GB.

After cleaning all except files for Nanopolish (nanoclean): ~700GB

Requirements for installing and preparing data:

A python version >= 2.7 is needed. Please make sure this is available in your PATH, together with virtualenv. C++11 required.

Pipelines

After 'launchme.sh', you can run the various pipelines, from the 'pipelines' folder

example:

cd pipelines	
./canu.sh <canu_location> <strain> <platform> <cov>

For details on the pipelines look at pipelines/README.md or launch each script with option "-h"

Warning! Please notice that the assemblers and scaffolders (except for smis) are not installed by the launchme.sh script. To run the pipelines you need to have installations of:

Abruijn (https://github.com/fenderglass/ABruijn)

Canu (https://github.com/marbl/canu)

PBcR (http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR)

Falcon-integrate (https://github.com/PacificBiosciences/FALCON-integrate)

Smartdenovo (https://github.com/ruanjue/smartdenovo)

MiniAsm and MiniMap (https://github.com/lh3/miniasm,https://github.com/lh3/minimap/)

Racon (https://github.com/isovic/racon)

Nanopolish (https://github.com/jts/nanopolish)

SPAdes (http://bioinf.spbau.ru/spades)

npScarf(https://github.com/mdcao/npScarf).

Additional software needed: bwa (https://github.com/lh3/bwa), samtools (https://github.com/samtools/samtools), bamtools (https://github.com/pezmaster31/bamtools)

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
pipelines		pipelines
utils		utils
.gitignore		.gitignore
README.md		README.md
launchme.sh		launchme.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YeastStrainsStudy

Instructions

Download data and utilities

Download only the final assemblies

Step 1. Download the assemblies:

Step 2. List the assemblies selecting strain, assembler and or platform:

Download all the data to run the pipelines:

Step 1. Download and install needed codes and scripts:

Step 2. Download data and prepare the fastq files:

Step 3. Once the data have been downloaded and the fastq files prepared, check the fastq files:

Step 4/A. If everything looks ok and there are no warnings from Step 3, you can clean up the data folders, deleting every intermediate files and folders:

Step 4/B. If everything looks ok and there are no warnings, you can clean up the data folders, deleting every intermediate files and folders not needed by Nanopolish:

Disk space required:

Requirements for installing and preparing data:

Pipelines

Warning! Please notice that the assemblers and scaffolders (except for smis) are not installed by the launchme.sh script. To run the pipelines you need to have installations of:

Abruijn (https://github.com/fenderglass/ABruijn)

Canu (https://github.com/marbl/canu)

PBcR (http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR)

Falcon-integrate (https://github.com/PacificBiosciences/FALCON-integrate)

Smartdenovo (https://github.com/ruanjue/smartdenovo)

MiniAsm and MiniMap (https://github.com/lh3/miniasm,https://github.com/lh3/minimap/)

Racon (https://github.com/isovic/racon)

Nanopolish (https://github.com/jts/nanopolish)

SPAdes (http://bioinf.spbau.ru/spades)

npScarf(https://github.com/mdcao/npScarf).

Additional software needed: bwa (https://github.com/lh3/bwa), samtools (https://github.com/samtools/samtools), bamtools (https://github.com/pezmaster31/bamtools)

About

Releases

Packages

Languages

fg6/YeastStrainsStudy

Folders and files

Latest commit

History

Repository files navigation

YeastStrainsStudy

Instructions

Download data and utilities

Download only the final assemblies

Step 1. Download the assemblies:

Step 2. List the assemblies selecting strain, assembler and or platform:

Download all the data to run the pipelines:

Step 1. Download and install needed codes and scripts:

Step 2. Download data and prepare the fastq files:

Step 3. Once the data have been downloaded and the fastq files prepared, check the fastq files:

Step 4/A. If everything looks ok and there are no warnings from Step 3, you can clean up the data folders, deleting every intermediate files and folders:

Step 4/B. If everything looks ok and there are no warnings, you can clean up the data folders, deleting every intermediate files and folders not needed by Nanopolish:

Disk space required:

Requirements for installing and preparing data:

Pipelines

Warning! Please notice that the assemblers and scaffolders (except for smis) are not installed by the launchme.sh script. To run the pipelines you need to have installations of:

Abruijn (https://github.com/fenderglass/ABruijn)

Canu (https://github.com/marbl/canu)

PBcR (http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR)

Falcon-integrate (https://github.com/PacificBiosciences/FALCON-integrate)

Smartdenovo (https://github.com/ruanjue/smartdenovo)

MiniAsm and MiniMap (https://github.com/lh3/miniasm,https://github.com/lh3/minimap/)

Racon (https://github.com/isovic/racon)

Nanopolish (https://github.com/jts/nanopolish)

SPAdes (http://bioinf.spbau.ru/spades)

npScarf(https://github.com/mdcao/npScarf).

Additional software needed: bwa (https://github.com/lh3/bwa), samtools (https://github.com/samtools/samtools), bamtools (https://github.com/pezmaster31/bamtools)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages