The workflows developed in the framework of this project are based on pipeline-v5
of the MGnify resource.
This branch is a child of the
pipeline_5.1
branch that contains all CWL descriptions of the MGnify pipeline version 5.1.
To run metaGOflow you need to make sure you have the following set on your computing environmnet first:
- python [v 3.8+]
- Docker [v 19.+] or Singularity [v 3.7.+]/Apptainer [v 1.+]
- cwltool [v 3.+]
- rocrate [v 0.7.0]
- ruamel.yaml [v 0.17.32+]
- Node.js [v 10.24.0+]
- Available storage ~160GB for databases
Depending on the analysis you are about to run, disk requirements vary. Indicatively, you may have a look at the metaGOflow publication for computing resources used in various cases.
git clone https://github.com/emo-bon/MetaGOflow
cd MetaGOflow
You can download databases for the EOSC-Life GOs workflow by running the
download_dbs.sh
script under the Installation
folder.
bash Installation/download_dbs.sh -f [Output Directory e.g. ref-dbs]
If you have one or more already in your system, then create a symbolic link pointing
at the ref-dbs
folder or at one of its subfolders/files.
The final structure of the DB directory should be like the following:
user@server:~/MetaGOflow: ls ref-dbs/
db_kofam/ diamond/ eggnog/ GO-slim/ interproscan-5.57-90.0/ kegg_pathways/ kofam_ko_desc.tsv Rfam/ silva_lsu/ silva_ssu/
We recommend utilizing Conda to create a virtual environment. We provide a Conda environment file that includes the necessary dependencies.
This will create a conda env called metagoflow
.
conda env create -f conda_environment.yml
conda activate metagoflow
- Edit the
config.yml
file to set the parameter values of your choice. For selecting all the steps, then set totrue
the variables in lines [2-6].
./run_wf.sh -s -n osd-short \
-d short-test-case \
-f test_input/wgs-paired-SRR1620013_1.fastq.gz \
-r test_input/wgs-paired-SRR1620013_2.fastq.gz
-
Create a job file (e.g., SBATCH file)
-
Enable Singularity, e.g. module load Singularity & all other dependencies
-
Add the run line to the job file
./run_wf.sh -n osd-short -d short-test-case \
-f test_input/wgs-paired-SRR1620013_1.fastq.gz \
-r test_input/wgs-paired-SRR1620013_2.fastq.gz
HINT: If you are using Docker, you may need to run the above command without the `-s' flag.
The samples are available in the test_input
folder.
We provide metaGOflow with partial samples from the Human Metagenome Project (SRR1620013 and SRR1620014) They are partial as only a small part of their sequences have been kept, in terms for the pipeline to test in a fast way.
-
In case you are using Docker, it is strongly recommended to avoid installing it through
snap
. -
RuntimeError
: slurm currently does not support shared caching, because it does not support cleaning up a worker after the last job finishes. Set the--disableCaching
flag if you want to use this batch system. -
In case you are having errors like:
cwltool.errors.WorkflowException: Singularity is not available for this tool
You may run the following command:
singularity pull --force --name debian:stable-slim.sif docker://debian:stable-sli
To make contribution to the project a bit easier, all the MGnify conditionals
and subworkflows
under
the workflows/
directory that are not used in the metaGOflow framework, have been removed.
However, all the MGnify tools/
and utils/
are available in this repo, even if they are not invoked in the current
version of metaGOflow.
This way, we hope we encourage people to implement their own conditionals
and/or subworkflows
by exploiting the
currently supported tools
and utils
as well as by developing new tools
and/or utils
.