Copyright Government of Canada 2018
Written by: National Microbiology Laboratory, Public Health Agency of Canada
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Gary van Domselaar: gary.vandomselaar@canada.ca
Mykrobe Parser is an R Script that parses the Mycobacterium tuberculosis specific results of Mykrobe. It seperates mutations into indivudal genetic regions and presents them in a tidy data format compatible with reporting.
Mykrobe Parser is compatible with Mykrobe v0.7.0 and the "201901" / "202010" panels.
Notes
- Samples that Mykrobe detects non-tuberculous mycobacteria in are removed for quality control purposes.
- If prediction for one gene fails in Mykrobe, Mykrobe parser will set all "[antimicrobial]_prediction" columns to "failed" and all genetic regions to "NA".
Dependancies:
- Conda
- Git (optional)
Download this repository or use git to clone this repository.
git clone https://github.com/phac-nml/mykrobe-parser.git
Use Conda to install required R dependancies
The mykrobe_parser_installation.yml
file is found in the root directory of this repository.
conda env create -f mykrobe_parser_installation.yml
Mykrobe is an open-sourced program the can predict antimicrobial resistance from Staphylococcus aureus and Mycobacterium tuberculosis.
Mykrobe arguments
To be compatible with Mykrobe Parser, Mykrobe must be run using the following arguments:
--format json
* Stores results a .json file instead of a text file. Mykrobe Parser only works with json files.
--panel 201901
* Uses the "201901" panel for resistance prediction
or
--panel 202010
* Uses the "202010" panel for resistance prediction
Necessary arguments
-d
(directory where the Mykrobe json files are stored)
or
-f
(a file path, or list of file paths to Mykrobe json files -eg. "~mykrobe-parser/data/FILE1.json,~mykrobe-parser/data/FILE2.json)
Optional arguments
Choose a panel with this option, defaults to 201901 if not supplied
-p 2019
or
-p 2020
These arguments encode text into the final output
-v CHARACTER , --version
Stores text to the "Mykrobe_Workflow_Version" column. This is meant to store the pipeline or Galaxy Workflow version.
-D INTEGER, --depth
Stores the '--min_depth' argument used to run Mykrobe [default= 5])
-c INTEGER, --conf
Stores the '--min_variant_conf' argument used to run Mykrobe [default= 10]
-n CHARACTER, --name
Stores text to the "Mutation_set_version". This is meant to store a simple version number for the Mykrobe_Resistance_probe_set (instead of tb-hunt-probe-set-jan-03-2019.fasta.gz).
-r CHARACTER, --reportfile
Changes the name of the antimicrobial susceptibility report file. [Default = output-[panel&version]-report.csv]
-s CHARACTER, --speciationfile
Changes the name of the antimicrobial susceptibility report file. [Default = output-[panel&version]-speciation_data.csv]
Running Mykrobe Parser
Run Mykrobe Parser with the following script: mykrobe_parser.R
Please view the RELEASES panel to access the backwards compatible scripts.
Eg. To run the script on mykrobe v0.7.0 output with the 201901 panel use the following command:
Rscript PATH/TO/R/mykrobe_parser.R -d PATH/TO/Mykrobe_json_files
Eg. To run the script on mykrobe v0.7.0 output with the 202010 panel use the following command:
Rscript PATH/TO/R/mykrobe_parser.R -p 2020 -d PATH/TO/Mykrobe_json_files
Outputs
Mykrobe Parser creates two files called "output-report.csv" and "output-speciation_data.csv" by default.
output-report.csv
- Stores Mykrobe's antimicrobial resistance data.
- The following columns are placeholders for internal data: "Lims_Comment", "Lims_INTComment", "LIMS_file".
output-jsondata.csv
- Stores Mykrobe's speciation data such as "phylo_group", "species", and "lineage" and their associated depths and percentages of coverage.