HRC Likelihood Generation

This is a README for the SNP likelihood calculation for the Haplotype Reference Consortium (Release 1) using the htslib versions of samtools/bcftools.

Download and install this tagged (rc8) version of samtools/bcftools/htslib

 git clone --recursive git://github.com/mcshane/hrc-release1.git hrc-release1
 cd hrc-release1
 make

Run samtools mpileup on your cohort (or multi-cohort) BAM files to generate BCF2 files. These should be the BAM files associated with all the samples submitted to the project in the haplotype VCFs. The VCF wgs.union.sites.breakMultiAC2.vcf.gz (md5sum : 78952ae50b90cb4707579cee7a3adaa5) in the union site list currently only available within the project.

The command can be split into multiple jobs for non-overlapping chunks across the genome. For a particular chunk,

 samtools/samtools mpileup -IE -C50 -d$MAXDEPTH -t DP,DP4 -l wgs.union.sites.breakMultiAC2.vcf.gz -g -r $CHROM:$FROM-$TO -f $REF -b $BAMLIST > $CHROM:$FROM-$TO.$COHORT.bcf
 bcftools/bcftools index $CHROM:$FROM-$TO.$COHORT.bcf

where

 $CHROM=chromosome
 $FROM=start position
 $END=end position
 $REF=GRCh37 reference fasta file used to align the BAM files, e.g. hs37d5.fa or human_g1k_v37.fasta
 $MAXDEPTH=Maximum depth per-BAM file. Should be adjusted depending on your coverage and if you have BAMs merged across multiple samples. Generally should be set to at least 10x the depth in your highest coverage input BAM.
 $BAMLIST=file listing location of input BAM files. One file per-line

Positions are 1-based, so chunking chr20 into 1Mbp chunks would look like 20:1-1000000, 20:1000001-2000000, ..., 20:63000001-63025520. File 1Mbp-chunks.txt could be used to generate the commands:

 awk '{print "samtools/samtools mpileup -IE -C50 -d$MAXDEPTH -t DP,DP4 -l wgs.union.sites.breakMultiAC2.vcf.gz -g -r $0 -f $REF -b $BAMLIST > $0.$COHORT.bcf"}' 1Mbp-chunks.txt

If you want to generate BCF one sample at a time, then chunking is not necessary and the command would be:

 samtools/samtools mpileup -IE -C50 -d$MAXDEPTH -t DP,DP4 -l wgs.union.sites.breakMultiAC2.vcf.gz -g -f $REF $SAMPLE_BAM > $SAMPLE.bcf
 bcftools/bcftools index $SAMPLE.bcf

If you generate per-sample BCF, skip step (3).

If you chunked in step (2), concat your chunks together with bcftools concat:
```
 bcftools/bcftools concat -Ob -f $CHROM.$COHORT.concat.list > $CHROM.$COHORT.bcf
 bcftools/bcftools index $CHROM.$COHORT.bcf
```
where $CHROM.$COHORT.concat.list is a file listing the coordinate-sorted chunked BCF files (one file per-line) from step (2) for a particular chromosome and cohort. One can create this list with
```
 ls $CHROM:*.$COHORT.bcf | sort -V > $CHROM.$COHORT.concat.list
```
If your list of files is not coordinate-sorted, then include option -a in the concat command.

Compare the sample names with the sample names in the haplotype VCFs that were submitted. The sample names in the BCF header can be queried with:

 bcftools/bcftools query -l $CHROM.$COHORT.bcf

If the sample names are not the same, please fix here or provide a map of the sample IDs.

 bcftools/bcftools view -h $CHROM.$COHORT.bcf > old_header.vcf
 ** map sample names to make new header ** > new_header.vcf
 (cat new_header.vcf ; bcftools/bcftools view -H $CHROM.$COHORT.bcf) | bcftools/bcftools view -Ob > $CHROM.$COHORT.new_header.bcf

Submit your cohort likelihood BCFs.

ONLY FOR CENTRAL ANALYSIS. Once all likelihood BCFs for all cohorts are in, we will merge with the following:

 bcftools/bcftools norm -Ou -m+ wgs.union.sites.breakMultiAC2.vcf.gz | bcftools/bcftools query -f '%CHROM\t%POS\t%REF,%ALT\n' | htslib/bgzip -c > wgs.union.sites.AC2.alleles.gz
 htslib/tabix -p vcf wgs.union.sites.AC2.alleles.gz
 bcftools/bcftools merge -Ou -i DP:sum,I16:sum,QS:sum $CHROM.*.bcf | bcftools/bcftools call -Ou -mAC alleles -f GQ,GP -T wgs.union.sites.AC2.alleles.gz | bcftools/bcftools norm -Ob -m- > $CHROM.bcf

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
bcftools @ 860d558		bcftools @ 860d558
htslib @ 8deb11d		htslib @ 8deb11d
samtools @ 16b3d9c		samtools @ 16b3d9c
.gitmodules		.gitmodules
1Mbp-chunks.txt		1Mbp-chunks.txt
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HRC Likelihood Generation

About

Releases

Packages

mcshane/hrc-release1

Folders and files

Latest commit

History

Repository files navigation

HRC Likelihood Generation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages