Skip to content

Obsoleted and replaced by github.com/zzhang526/MosaicHunter

License

Notifications You must be signed in to change notification settings

AugustHuang/MosaicHunter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A script/tool for detecting postzygotic single-nucleotide mutations in human whole-genome sequencing data.

======Preparation

Make sure that you have installed the listed softwares, then added pre-installed softwares and the directory /your/MosaicHunter/directory/Tools in your PATH

In order to generate essential reference data and compile c, c++, java scripts, you should run this command once:

seqpipe -m /your/MosaicHunter/directory/MosaicHunter.pipe preparation REFERENCE_DIR=/your/MosaicHunter/directory/Reference TOOLS_DIR=/your/MosaicHunter/directory/Tools

Pre-installed softwares required for the script:

#SeqPipe: 0.4.12

#BEDtools: 2.15.0

#SAMtools: 0.1.18

#FASTX-Toolkit: 0.0.13

#Blat: 34x13

#fastasplitn

Reference data for the script: (Please put them into /your/MosaicHunter/directory/Reference)

#human_g1k_v37.fasta (available at http://soms.nibs.ac.cn:6235/glk/human_g1k_v37.fasta)

#human_g1k_v37.genome (available at http://soms.nibs.ac.cn:6235/glk/human_g1k_v37.genome)

#human_hg19.fasta (available at http://soms.nibs.ac.cn:6235/hg/human_hg19.fasta)

#all_repeats.b37.bed (available at http://soms.nibs.ac.cn:6235/dbsnp/all_repeats.b37.bed)

#PAR.b37.bed (available at http://soms.nibs.ac.cn:6235/dbsnp/PAR.b37.bed)

#dbsnp_137.b37.SNP_AF.tsv (available at http://soms.nibs.ac.cn:6235/dbsnp/dbsnp_137.b37.SNP_AF.tsv)

#observed_in_common.bed (available at http://soms.nibs.ac.cn:6235/dbsnp/observed_in_common.bed)

Tools for the script: (Please put them into /your/MosaicHunter/directory/Tools)

#generate_beta_log10_val_file.r

#count_homopolymer.cpp

#myjoin

#my_join.pl

#PileupFilter.java

#genotyper.pipe

#Yyx_genotype_log10lik_with_precalc_beta.c

#Yyx_real_log10lik_from_baseQ.c

#Yyx_individual_genotyper.c

#LoFreq_call.c

#sam2fa.pl

#blat_best.pipe

#highest-score.pl

#calculate-score-coverage-identity.pl

#intersect_bed12.pipe

#my.grep

#trimBamByBlock.pl

#strand_bias.R

#allele_pos_dist.R

#splitSamByAllele.pl

======Run

To identify pSNM sites from the whole-genome sequencing data, you can run this command:

seqpipe -m /your/MosaicHunter/directory/MosaicHunter.pipe MosaicHunter REFERENCE_DIR=/your/MosaicHunter/directory/Reference TOOLS_DIR=/your/MosaicHunter/directory/Tools TEMP_DIR=/your/temp/directory INPUT_BAM=example.bam [INDEL_CNV_BED=example.bed] PROJECT_NAME=example GENDER=M THREAD_NUM=5

[INPUT_BAM]: the path of your input .bam file, the .bam file should be sorted and indexed

[INDEL_CNV_BED]: the path of a .bed file containing all the CNV and indel-flanking(+/-5bp) regions which will be masked in our pipeline

[PROJECT_NAME]: a string used as the prefix and suffix of the output files's name

[GENDER]: the gender of the subject, "F" for female or "M" for male

[THREAD_NUM]: the maximum number of threads for running the script

Recommended pre-processing of the .bam file:

1) Removing the duplicated, improper-paired, and multi-hit reads

2) Removing the reads with more than three mismatches

3) Processing the reads by GATK's indel realignment and base quality score recalibration

To change the running order and the parameters of the Bayesian genotyper and the error filters, you can edit the scripts of MosaicHunter in /MosaicHunter/MosaicHunter.pipe, according to the user manual of seqpipe.

======Output

The final list of the pSNM candidates could be found at MosaicHunter_[PROJECT_NAME]/[PROJECT_NAME].mosaic.final.tsv

The columns in the final list represent:

1) chromosome

2) position

3) total depth

4) reference nt

5) alternative nt

6) reference depth

7) alternative depth

8) -log10 of posterior probability of ref-hom genotype

9) -log10 of posterior probability of het genotype

10) -log10 of posterior probability of alt-hom genotype

11) -log10 of posterior probability of mosaic genotype

12) population allele fraction in dbSNP 137, -1 for annotated sites without information of allele fraction, -2 for unannotated sites

13) sequence of +/-500bp flanking regions

About

Obsoleted and replaced by github.com/zzhang526/MosaicHunter

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published