SnakeMake Pharmacogenetics pipeline

Content Guide:

About the Project
Datasets
Built with
Software
Binaries
File Structure
Files
Folders
Usage
Roadmap
Versioning
Authors
Acknowledgments

About the project:

This is the development repository for a pipeline created to perform frequency analysis on African genetic datasets. This work is licensed under a Creative Commons Attribution 4.0 International License.

Datasets:

This pipeline is designed to accept variant call format data in the form of .vcf files. Due to some of the bioinformatics software used internally, these files are required to be compressed using BG-zip compression, and provided with an accompanying Tabix index file. Both of these peices of software are provided by Samtools, a standard Bioinformatics software. These two files provide a block-level compressed format of your data, and a block index, allowing the software to decompress portions of your file and access spesific entries without having to decompress the entire file.

This is also just good practice and should be a bioinformatics software standard

Please be advised, BG-zip compression is not the same as gzip compression such as that provided by linuxes gzip command. Though the final output is still block-level compression and is operable by both programs, you will need BG-zip compression in order to create a Tabix index.

Built with:

This has been made using a python-based domain spesific language (DSL) called Snakemake and coded to run on a PBS/Torque environment using the qsub command (this is set by the profile folder). As such, it needs to be run on a server with the appropriate binaries and batch scheduling software.

Software:

Below is a list of software used by this pipeline:

PBS/Torque batch scheduler
Snakemake
PLINK-2.0
VCF-Tools
liftOverPlink(Binaries contained within this repo. Update at own risk!)
liftOver(Required dependancy for liftOverPlink)
e! Ensembl VEP API

Binaries:

Below is a list of binary dependancies used in this pipeline.

Reference Genomes (properly compressed with accompanying index and dictionary files)
GRCh 38
Addittional genomes as needed based on input data

File Structure:

This pipeline uses the standardised folder structure, where the workflow itself is located under the workflow folder.

.
├── config # All config data (PBS Profile, genes, etc)
├── resources # Commonly used resources (WARNING: DEPRECIATING SOON)
├── results # The output of the pipeline
├── workflow # The entrypoint to the code of the pipeline
└── README.md

This project uses the following naming conventions:

Files:

All user generated files should be named using under-score naming conventions. Spaces are replaced with an underscore and co capital letters are used.

E.g. this_is_a_test_example.txt

All Snakemake generated files are all labeled according to <sample_name>.<file-extension> format and stored in a folder named according to the process that produced it. > E.g. intermediates/liftover/1000g.vcf

Folders:

All user generated folders should use camelCase naming conventions, where the first letter of a multi-word name is lower-case and spaces are removed with the initial letter of the following word capitalised.

E.g. thisIsATestExample

All snakemake generated folders use the following folder structure:

.
└── intermediates
└── <ruleName>
  └── <file_name>.<extension>
      └── <file_name>.<extension>
          └── <file_name>.<extension>

Usage:

use the cd command to navigate to the root repository directory containing the Snakefile.
To start the pipeline and produce the default list of files, simply call snakemake on the command line with appropriate arguments. (E.g. --profile and --cluster-config flags)
To generate a runtime report, detailing figures produced and performance-related numbers, use the --report snakemake flag (This requires that you have the Jinja2 python package installed.). The HTML file produced is completely self-contained and can be shared as needed. You can view it using any web browser such as firefox or Google Chrome, etc.

Roadmap:

See our Projects tab and Issues tracker for a list of proposed features (and known issues).

Versioning:

We use the SemVer syntax to manage and maintain version numbers. For the versions available, see the releases on this repository here.

Acknowledgements:

Many thanks to the following individuals who have been instrumental to the success of this project:

Author Graeme Ford G-Kodes	Supervisor Prof. Michael Pepper	Co-Supervisor Prof. Fourie Joubert fouriejoubert
Tester Fatima Barmania-Faruk Fatimabp	Tester Megan Ryder Megs47

Name		Name	Last commit message	Last commit date
Latest commit History 758 Commits
.docs		.docs
.github		.github
.tests/unit		.tests/unit
config		config
input		input
workflow		workflow
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SnakeMake Pharmacogenetics pipeline

Content Guide:

About the project:

Datasets:

Built with:

Software:

Binaries:

File Structure:

Files:

Folders:

Usage:

Roadmap:

Versioning:

Acknowledgements:

Author

Graeme Ford

G-Kodes

Supervisor

Prof. Michael Pepper

Co-Supervisor

Prof. Fourie Joubert

fouriejoubert

Tester

Fatima Barmania-Faruk

Fatimabp

Tester

Megan Ryder

Megs47

About

Releases

Packages

Contributors 3

Languages

License

Tuks-ICMM/Pharmacogenetic-Analysis-Pipeline

Folders and files

Latest commit

History

Repository files navigation

SnakeMake Pharmacogenetics pipeline

Content Guide:

About the project:

Datasets:

Built with:

Software:

Binaries:

File Structure:

Files:

Folders:

Usage:

Roadmap:

Versioning:

Acknowledgements:

Author

Supervisor

Co-Supervisor

Tester

Tester

About

Topics

Resources

License

Stars

Watchers

Forks

Languages