Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Adding Class II HLA typing to immuno #57

Merged
merged 50 commits into from
Oct 19, 2022
Merged

Conversation

Layth17
Copy link
Member

@Layth17 Layth17 commented Jul 8, 2022

  • Environment

  • Docker image docker(mgibio/phlat:1.1_withindex) (note: mem must be >= 16G for bowtie2 to work properly. I ran it with 20G) bsub -Is -M 20G -G $GROUP -n 1 -R 'select[mem>20000] rusage[mem=20000] span[hosts=1]' -q oncology-interactive -a 'docker(mgibio/phlat:1.1_withindex)' /bin/bash

  • Within docker, bash script run.b38.sh is working.

  • Create phlat.wdl

  • Implement phlat.wdl within immuno.wdl

  • Input/Output cleaning

  • Extract Alleles -- extract class I from OptiType and class II from PHLAT


testing the steps below will require me to switch pVACtools docker image to laljorani20/pvactools:latest (temporary until pvactools is updated)

@Layth17 Layth17 linked an issue Jul 8, 2022 that may be closed by this pull request
@Layth17 Layth17 self-assigned this Jul 8, 2022
@Layth17 Layth17 requested a review from malachig July 8, 2022 20:42
phlat/run.b38.sh Outdated

# extract hla regions and unmapped reads
echo "extracting hla region and unmapped reads ..."
$SAMTOOLS view -h -T $REF_FASTA $BAM chr6:29836259-33148325 >$tmpdir/reads.sam
Copy link
Member

@chrisamiller chrisamiller Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of the larger pipelines, (as opposed to a one-off bash script it came from) I don't like that this has a hardcoded human build38 path. (if we tried to run this on b37 data (or mouse data), it would happily do the wrong thing!). The ideal thing to do here would probably be to use the annotation build info to look up the HLA locus, I think? That could look something like this:

  1. grep the GTF file for the proper gene names (HLA-A, HLA-B, HLA-C,, etc) to get coordinates
  2. use those coordinates in this command

To be clear, I don't think phlat works on mouse data, so we can ignore that for now, but being build-agnostic for human would be good!

@Layth17
Copy link
Member Author

Layth17 commented Jul 11, 2022

phlat folder should be deleted ultimately. it is only in this repo atm for quick testing purposes.
run.b38.sh and phlat-release/ belong within the docker image

/usr/bin/awk '{FS="\t";getline;for(n=2;n<=NF-2;n++){if($n==""){}else{printf "HLA-"$n"\n"}}}' ~{file} > ~{outname}
/usr/bin/awk '{FS="\t";getline;for(n=2;n<=NF-2;n++){if($n==""){}else{printf "HLA-"$n"\n"}}}' ~{file} > ~{temp}
grep "HLA_D" ~{phlat_file} | /usr/bin/awk '{FS="\t";if($2==""){}else{printf $2"\n"};if($3==""){}else{printf $3"\n"}}' >> ~{temp}
/usr/bin/awk -F":" '{print $1 ":" $2}' ~{temp} > ~{outname}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, this can be done without a temporary file by grouping the commands together for the pipe. Something like:

(/usr/bin/awk ... ~{file}; grep ... ~{phlat_file} | /usr/bin/awk ...) | /usr/bin/awk ...

@@ -27,7 +27,7 @@ task pvacfuse {

Int space_needed_gb = 10 + round(size([input_fusions_zip], "GB") * 3)
runtime {
docker: "griffithlab/pvactools:3.0.0"
docker: "laljorani20/pvactools:latest"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should switch this back to an official release before merging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is blocked on @susannasiebert switching back to pvactools effort and creating a release. But yeah we don't want to be pointing to personal docker images like this. I have been meaning to search the whole repo and see if we are doing it anywhere else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just released pVACtools 3.1.0 that has the combinatorial class II alleles enabled.

@@ -50,7 +50,7 @@ task pvacseq {
runtime {
memory: "16GB"
cpu: n_threads
docker: "griffithlab/pvactools:3.0.0"
docker: "laljorani20/pvactools:latest"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Same here.)

cpu: nthreads
docker: "mgibio/phlat:1.1_withindex"
disks: "local-disk ~{space_needed_gb} HDD"
bootDiskSizeGb: 3*space_needed_gb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the boot disk need to scale this severely with the size of the input data?

Copy link
Member Author

@Layth17 Layth17 Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mimicking optitype.wdl iirc when I created phlat.wdl

If there is a better way to do this, I am all for it. Would times 2 be better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on a resource usage script that I hope will help us see how much disk space is actually being used so that we can fine tune some of these resource requests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was asking more if the boot disk really needs to scale in the size of the inputs at all. The disks entry should cover space for staging the inputs and working on the outputs. The boot disk needs to hold the docker image and OS files for the VM, but the work ought to get done in the space defined by disks.

Array[File?] combined = [combined_all_epitopes, combined_aggregated_report, combined_filtered_epitopes, combined_aggregated_metrics_file]

# Don't think this works the way it is intended. Refer to documentation.
# https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#globs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is intended and what is happening instead?

Copy link
Member Author

@Layth17 Layth17 Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the looks of the the command that uses glob, it seemed like it intends to iterate through pvacfuse_predictions/ like this:

| PVACfuse_predictions/
-->| mhc_i
----> grabs all files
-->| mhc_ii
---->grabs all files
-->| combined
---->grabs all files

And output a similar structure to the above. Or dump all contents in one file.

However the real result of that was like this:

| PVACfuse_predictions/
-->| mhc_i
---->grabs all files

Leading to the output:

PVACfuse_predictions/content of mhc_i

And the documentation does state that glob skips folders and their contents. Only thing that is slightly vague to me is the use of the double * and why did it pick mhc_i as opposed to another folder unless that was just purely due to order.

Copy link
Member

@malachig malachig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by Tom above, I think we want to switch to a docker image for an official pvactools release before we consider this final.

@malachig
Copy link
Member

malachig commented Oct 7, 2022

I have tested this PR on real patient data and it seems to be working nicely:

  • I got a successful immuno.wdl run.
  • I manually examined the phlat HLA typing results and compared class I calls to optitype calls (100% agreement)
  • I examined the organization of class I and II HLA typing results into final folders with improved file names. This looks good to me
  • I confirmed that class I pvacseq and pvacfuse results are using the class I HLA types predicted
  • I confirmed that class II pvacseq and pvacfuse results are using the class II HLA types predicted
  • I loaded both the class I and class II neoantigen results in pvacview and manually examined them

I think other than the issue with using a personal docker image for pvactools this is ready to merge... We should ask @susannasiebert what she thinks about a timeline for a release that would include the pvactools improvements needed here. I believe the main thing was the code that produces classII allele pairings automatically?

@susannasiebert
Copy link
Member

3.1.0 is out now and has the required feature.

@Layth17 Layth17 requested a review from malachig October 18, 2022 00:15
Copy link
Member

@malachig malachig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@Layth17 Layth17 merged commit 018f5c0 into wustl-oncology:main Oct 19, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add class II HLA typing to immuno analysis
5 participants