feat: add workflow catalog yml (#199)

* add workflow catalog yml * add workflow readme * docs: update readme and config readme
IKIM-Essen · Oct 12, 2021 · 2013533 · 2013533
1 parent e9be78e
commit 2013533
Show file tree

Hide file tree

Showing 4 changed files with 103 additions and 104 deletions.
diff --git a/.gitignore b/.gitignore
@@ -10,12 +10,14 @@
 !workflow/rules/annotation/*
 !workflow/resources/*
 !config
+!config/REAMDE.md
 !config/config.yaml
 !config/multiqc_config.yaml
 !config/pep
 !config/pep/*
 !LICENSE
 !README.md
+!.snakemake-workflow-catalog.yml
 !.gitignore
 !.gitattributes
 !.editorconfig

diff --git a/.snakemake-workflow-catalog.yml b/.snakemake-workflow-catalog.yml
@@ -0,0 +1,4 @@
+usage:
+  software-stack-deployment:
+    conda: true 
+  report: true
diff --git a/README.md b/README.md
@@ -1,116 +1,22 @@
-# UnCoVar: Snakemake workflow for SARS-Cov-2 strain and variant calling
+![UnCoVar2](https://user-images.githubusercontent.com/77535027/133610563-d190e25c-504e-4953-92dd-f84a5b4a1191.png)
+# UnCoVar: SARS-CoV-2 Variant Calling and Lineage Assignment
 
 [![Snakemake](https://img.shields.io/badge/snakemake-≥6.3.0-brightgreen.svg)](https://snakemake.bitbucket.io)
 [![GitHub actions status](https://github.com/koesterlab/snakemake-workflow-sars-cov2/workflows/Tests/badge.svg?branch=master)](https://github.com/koesterlab/snakemake-workflow-sars-cov2/actions?query=branch%3Amaster+workflow%3ATests)
 [![Docker Repository on Quay](https://quay.io/repository/uncovar/uncovar/status "Docker Repository on Quay")](https://quay.io/repository/uncovar/uncovar)
 
-![UnCoVar2](https://user-images.githubusercontent.com/77535027/133610563-d190e25c-504e-4953-92dd-f84a5b4a1191.png)
+A reproducible and scalable workflow for transparent and robust SARS-CoV-2 variant calling and lineage assignment.
+
+## Usage
+
+The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/?usage=TBD).
+
+If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository and its DOI (see above).
 
 ## Authors
 
-* Alexander Thomas (@alethomas)
 * Thomas Battenfeld (@thomasbtf)
+* Alexander Thomas (@alethomas)
 * Felix Wiegand (@fxwiegand)
 * Folker Meyer (@folker)
 * Johannes Köster (@johanneskoester)
-
-## Usage
-
-### Step 1: Obtain a copy of this workflow
-
-TODO upon publishing fill this with instructions on how to use the github template functionality.
-
-### Step 2: Configure workflow
-
-Configure the workflow according to your needs via editing the files under `config`. Adjust `config.yaml` to configure the workflow execution, and `pep/samples.csv` to specify your sample setup.
-
-#### Passing NGS samples
-It is recommended to use the following structure to organize the data:
-
-    ├── archive
-    ├── incoming
-    └── snakemake-workflow-sars-cov2
-        ├── data
-        └── ...
-
-The incoming directory should contain paired end reads in a FASTQ format. It is recommended to work with compressed files (e.g. `sample-name.fastq.gz`).
-
-To load your data into the workflow execute `python preprocessing/update_sample_sheet.py` with `snakemake-workflow-sars-cov2` as working directory.
-
-The executed script automatically copies your data into the data directory and moves all files from incoming directory to the archive. 
-Moreover, the sample sheet is automatically updated with the new files. Please note, that only the part of the filename before the first '_' character is used as the sample name for the workflow.
-
-### Step 3: Install Snakemake
-
-Install Snakemake using [conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html):
-
-    conda create -c bioconda -c conda-forge -n snakemake snakemake
-
-For Snakemake installation details, see the [instructions in the Snakemake documentation](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html).
-
-### Step 4: Execute workflow
-
-Test your configuration by performing a dry-run via
-
-    snakemake --use-conda -n
-
-Execute the workflow locally via
-
-    snakemake --use-conda --cores $N --resources ncbi_api_requests=1
-
-using `$N` cores.
-Non-local execution can be done via Snakemake's extensive cluster and cloud support, see the [Snakemake documentation](https://snakemake.readthedocs.io/en/stable/executable.html).
-
-### Step 5: Investigate results
-
-After successful execution, you can create a self-contained interactive HTML report with all results via:
-
-    snakemake --report report.zip
-
-## Tools, Frameworks and Packages used in UnCoVar
-
-This project wouldn't be possible without several open source libraries:
-
-| Tool           | Link                                              |
-|----------------|---------------------------------------------------|
-| ABySS          | www.doi.org/10.1101/gr.214346.116                 |
-| Altair         | www.doi.org/10.21105/joss.01057                   |
-| BAMClipper     | www.doi.org/10.1038/s41598-017-01703-6            |
-| BCFtools       | www.doi.org/10.1093/gigascience/giab008           |
-| BEDTools       | www.doi.org/10.1093/bioinformatics/btq033         |
-| Biopython      | www.doi.org/10.1093/bioinformatics/btp163         |
-| bwa            | www.doi.org/10.1093/bioinformatics/btp324         |
-| delly          | www.doi.org/10.1093/bioinformatics/bts378         |
-| ensembl-vep    | www.doi.org/10.1186/s13059-016-0974-4             |
-| entrez-direct  | www.ncbi.nlm.nih.gov/books/NBK179288              |
-| fastp          | www.doi.org/10.1093/bioinformatics/bty560         |
-| FastQC         | www.bioinformatics.babraham.ac.uk/projects/fastqc |
-| fgbio          | github.com/fulcrum-genomics/fgbio             |
-| FreeBayes      | www.arxiv.org/abs/1207.3907                       |
-| intervaltree   | github.com/chaimleib/intervaltree             |
-| Jupyter        | www.jupyter.org                                   |
-| kallisto       | www.doi.org/10.1038/nbt.3519                      |
-| Kraken2        | www.doi.org/10.1186/s13059-019-1891-0             |
-| Krona          | www.doi.org/10.1186/1471-2105-12-385              |
-| mason          | www.http://publications.imp.fu-berlin.de/962      |
-| MEGAHIT        | www.doi.org/10.1093/bioinformatics/btv033         |
-| Minimap2       | www.doi.org/10.1093/bioinformatics/bty191         |
-| MultiQC        | www.doi.org/10.1093/bioinformatics/btw354         |
-| pandas         | pandas.pydata.org                                 |
-| Picard         | broadinstitute.github.io/picard                   |
-| PySAM          | www.doi.org/10.11578/dc.20190903.1                |
-| QUAST          | www.doi.org/10.1093/bioinformatics/btt086         |
-| RaGOO          | www.doi.org/10.1186/s13059-019-1829-6             |
-| ruamel.yaml    | www.sourceforge.net/projects/ruamel-yaml          |
-| Rust-Bio-Tools | github.com/rust-bio/rust-bio-tools            |
-| SAMtools       | www.doi.org/10.1093/bioinformatics/btp352         |
-| Snakemake      | www.doi.org/10.12688/f1000research.29032.1        |
-| sourmash       | www.doi.org/10.21105/joss.00027                   |
-| SPAdes         | www.doi.org/10.1089/cmb.2012.0021                 |
-| SVN            | www.doi.org/10.1142/s0219720005001028             |
-| Tabix          | www.doi.org/10.1093/bioinformatics/btq671         |
-| Trinity        | www.doi.org/10.1038/nprot.2013.084                |
-| Varlociraptor  | www.doi.org/10.1186/s13059-020-01993-6            |
-| Vega-Lite      | www.doi.org/10.1109/TVCG.2016.2599030             |
-| Velvet         | www.doi.org/10.1101/gr.074492.107                 |
-| vembrane       | github.com/vembrane/vembrane                  |
diff --git a/config/README.md b/config/README.md
@@ -0,0 +1,87 @@
+# General settings
+To configure this workflow, modify `config/config.yaml` according to your needs, following the explanations provided in the file.
+
+# Sample sheet
+The sample sheet contains all samples to be analyzed by UnCoVar.
+## Auto filling
+
+UnCoVar offers the possibility to automatically append samples to the sample sheet. To load your data into the workflow execute 
+
+    snakemake --cores all --use-conda update_sample
+
+with the root of the UnCoVar as working directory. It is recommended to use the following structure to when adding data automatically:
+
+    ├── archive
+    ├── incoming
+    └── snakemake-workflow-sars-cov2
+        ├── data
+        └── ...
+
+However, this structure is not set in stone and can be adjusted via the `config/config.yaml` file under `data-handling`. Only the following path to the corresponding folders, relative to the directory of UnCoVar are needed:
+
+- **incoming**: path of incoming data, which is moved to the data directory by the preprocessing script. Defaults to `../incoming/`.
+- **data**: path to store data within the workflow. defaults to `data/`.
+- **archive**: path to archive data from the results from the analysis to. Defaults to `../archive/`.
+
+The incoming directory should contain paired end reads in (compressed) FASTQ format. UnCoVar automatically copies your data into the data directory and moves all files from incoming directory to the archive. After the analysis, all results are compressed and saved alongside the reads.
+
+Moreover, the sample sheet is automatically updated with the new files. Please note, that only the part of the filename before the first '_' character is used as the sample name within the workflow.
+
+## Manual filling
+
+Of course, samples to be analyzed can also be added manually to the sample sheet. For each sample, the a new line in `config/pep/samples.csv` with the following content has to be defined:
+
+- **sample_name**: name or identifier of sample
+- **fq1**: path to read 1 in FASTQ format
+- **fq2**: path to read 2 in FASTQ format
+- **date**: sampling date of the sample
+- **is_amplicon_data**: indicates whether the data was generated with a shotgun (0) or amplicon (1) sequencing
+
+# Tools, Frameworks and Packages used
+
+This project wouldn't be possible without several open source libraries:
+
+| Tool           | Link                                              |
+|----------------|---------------------------------------------------|
+| ABySS          | www.doi.org/10.1101/gr.214346.116                 |
+| Altair         | www.doi.org/10.21105/joss.01057                   |
+| BAMClipper     | www.doi.org/10.1038/s41598-017-01703-6            |
+| BCFtools       | www.doi.org/10.1093/gigascience/giab008           |
+| BEDTools       | www.doi.org/10.1093/bioinformatics/btq033         |
+| Biopython      | www.doi.org/10.1093/bioinformatics/btp163         |
+| bwa            | www.doi.org/10.1093/bioinformatics/btp324         |
+| delly          | www.doi.org/10.1093/bioinformatics/bts378         |
+| ensembl-vep    | www.doi.org/10.1186/s13059-016-0974-4             |
+| entrez-direct  | www.ncbi.nlm.nih.gov/books/NBK179288              |
+| fastp          | www.doi.org/10.1093/bioinformatics/bty560         |
+| FastQC         | www.bioinformatics.babraham.ac.uk/projects/fastqc |
+| fgbio          | github.com/fulcrum-genomics/fgbio             |
+| FreeBayes      | www.arxiv.org/abs/1207.3907                       |
+| intervaltree   | github.com/chaimleib/intervaltree             |
+| Jupyter        | www.jupyter.org                                   |
+| kallisto       | www.doi.org/10.1038/nbt.3519                      |
+| Kraken2        | www.doi.org/10.1186/s13059-019-1891-0             |
+| Krona          | www.doi.org/10.1186/1471-2105-12-385              |
+| mason          | www.http://publications.imp.fu-berlin.de/962      |
+| MEGAHIT        | www.doi.org/10.1093/bioinformatics/btv033         |
+| Minimap2       | www.doi.org/10.1093/bioinformatics/bty191         |
+| MultiQC        | www.doi.org/10.1093/bioinformatics/btw354         |
+| pandas         | pandas.pydata.org                                 |
+| Picard         | broadinstitute.github.io/picard                   |
+| PySAM          | www.doi.org/10.11578/dc.20190903.1                |
+| QUAST          | www.doi.org/10.1093/bioinformatics/btt086         |
+| RaGOO          | www.doi.org/10.1186/s13059-019-1829-6             |
+| ruamel.yaml    | www.sourceforge.net/projects/ruamel-yaml          |
+| Rust-Bio-Tools | github.com/rust-bio/rust-bio-tools            |
+| SAMtools       | www.doi.org/10.1093/bioinformatics/btp352         |
+| Snakemake      | www.doi.org/10.12688/f1000research.29032.1        |
+| sourmash       | www.doi.org/10.21105/joss.00027                   |
+| SPAdes         | www.doi.org/10.1089/cmb.2012.0021                 |
+| SVN            | www.doi.org/10.1142/s0219720005001028             |
+| Tabix          | www.doi.org/10.1093/bioinformatics/btq671         |
+| Trinity        | www.doi.org/10.1038/nprot.2013.084                |
+| Varlociraptor  | www.doi.org/10.1186/s13059-020-01993-6            |
+| Vega-Lite      | www.doi.org/10.1109/TVCG.2016.2599030             |
+| Velvet         | www.doi.org/10.1101/gr.074492.107                 |
+| vembrane       | github.com/vembrane/vembrane                  |
+