-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Adding Class II HLA typing to immuno #57
Conversation
phlat/run.b38.sh
Outdated
|
||
# extract hla regions and unmapped reads | ||
echo "extracting hla region and unmapped reads ..." | ||
$SAMTOOLS view -h -T $REF_FASTA $BAM chr6:29836259-33148325 >$tmpdir/reads.sam |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of the larger pipelines, (as opposed to a one-off bash script it came from) I don't like that this has a hardcoded human build38 path. (if we tried to run this on b37 data (or mouse data), it would happily do the wrong thing!). The ideal thing to do here would probably be to use the annotation build info to look up the HLA locus, I think? That could look something like this:
- grep the GTF file for the proper gene names (
HLA-A, HLA-B, HLA-C,
, etc) to get coordinates - use those coordinates in this command
To be clear, I don't think phlat works on mouse data, so we can ignore that for now, but being build-agnostic for human would be good!
|
/usr/bin/awk '{FS="\t";getline;for(n=2;n<=NF-2;n++){if($n==""){}else{printf "HLA-"$n"\n"}}}' ~{file} > ~{outname} | ||
/usr/bin/awk '{FS="\t";getline;for(n=2;n<=NF-2;n++){if($n==""){}else{printf "HLA-"$n"\n"}}}' ~{file} > ~{temp} | ||
grep "HLA_D" ~{phlat_file} | /usr/bin/awk '{FS="\t";if($2==""){}else{printf $2"\n"};if($3==""){}else{printf $3"\n"}}' >> ~{temp} | ||
/usr/bin/awk -F":" '{print $1 ":" $2}' ~{temp} > ~{outname} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, this can be done without a temporary file by grouping the commands together for the pipe. Something like:
(/usr/bin/awk ... ~{file}; grep ... ~{phlat_file} | /usr/bin/awk ...) | /usr/bin/awk ...
definitions/tools/pvacfuse.wdl
Outdated
@@ -27,7 +27,7 @@ task pvacfuse { | |||
|
|||
Int space_needed_gb = 10 + round(size([input_fusions_zip], "GB") * 3) | |||
runtime { | |||
docker: "griffithlab/pvactools:3.0.0" | |||
docker: "laljorani20/pvactools:latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should switch this back to an official release before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is blocked on @susannasiebert switching back to pvactools effort and creating a release. But yeah we don't want to be pointing to personal docker images like this. I have been meaning to search the whole repo and see if we are doing it anywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just released pVACtools 3.1.0 that has the combinatorial class II alleles enabled.
definitions/tools/pvacseq.wdl
Outdated
@@ -50,7 +50,7 @@ task pvacseq { | |||
runtime { | |||
memory: "16GB" | |||
cpu: n_threads | |||
docker: "griffithlab/pvactools:3.0.0" | |||
docker: "laljorani20/pvactools:latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Same here.)
cpu: nthreads | ||
docker: "mgibio/phlat:1.1_withindex" | ||
disks: "local-disk ~{space_needed_gb} HDD" | ||
bootDiskSizeGb: 3*space_needed_gb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the boot disk need to scale this severely with the size of the input data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mimicking optitype.wdl iirc when I created phlat.wdl
If there is a better way to do this, I am all for it. Would times 2 be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on a resource usage script that I hope will help us see how much disk space is actually being used so that we can fine tune some of these resource requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I was asking more if the boot disk really needs to scale in the size of the inputs at all. The disks
entry should cover space for staging the inputs and working on the outputs. The boot disk needs to hold the docker image and OS files for the VM, but the work ought to get done in the space defined by disks
.
Array[File?] combined = [combined_all_epitopes, combined_aggregated_report, combined_filtered_epitopes, combined_aggregated_metrics_file] | ||
|
||
# Don't think this works the way it is intended. Refer to documentation. | ||
# https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#globs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is intended and what is happening instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the looks of the the command that uses glob, it seemed like it intends to iterate through pvacfuse_predictions/ like this:
| PVACfuse_predictions/
-->| mhc_i
----> grabs all files
-->| mhc_ii
---->grabs all files
-->| combined
---->grabs all files
And output a similar structure to the above. Or dump all contents in one file.
However the real result of that was like this:
| PVACfuse_predictions/
-->| mhc_i
---->grabs all files
Leading to the output:
PVACfuse_predictions/content of mhc_i
And the documentation does state that glob skips folders and their contents. Only thing that is slightly vague to me is the use of the double * and why did it pick mhc_i as opposed to another folder unless that was just purely due to order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned by Tom above, I think we want to switch to a docker image for an official pvactools release before we consider this final.
I have tested this PR on real patient data and it seems to be working nicely:
I think other than the issue with using a personal docker image for pvactools this is ready to merge... We should ask @susannasiebert what she thinks about a timeline for a release that would include the pvactools improvements needed here. I believe the main thing was the code that produces classII allele pairings automatically? |
3.1.0 is out now and has the required feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Environment
Docker image
docker(mgibio/phlat:1.1_withindex)
(note: mem must be >= 16G for bowtie2 to work properly. I ran it with 20G)bsub -Is -M 20G -G $GROUP -n 1 -R 'select[mem>20000] rusage[mem=20000] span[hosts=1]' -q oncology-interactive -a 'docker(mgibio/phlat:1.1_withindex)' /bin/bash
Within docker, bash script
run.b38.sh
is working.Create
phlat.wdl
Implement
phlat.wdl
withinimmuno.wdl
Input/Output cleaning
Extract Alleles -- extract class I from OptiType and class II from PHLAT
testing the steps below will require me to switch pVACtools docker image to laljorani20/pvactools:latest (temporary until pvactools is updated)
HLA Consensus -- must output a list of valid HLAs for class I and II
Relevant sources:
Allele X not valid for Method Y griffithlab/pVACtools#375
Feature: Auto-generate MHC class II combinations griffithlab/pVACtools#528
Auto-generate MHC class II combinations griffithlab/pVACtools#813 (HLA class II pairing)
For class II,
HLA-
prefix not needed and in fact would cause issues.pVACseq and pVACfuse working properly with the input provided
grab the outputs into final_output