Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

VCF CDS_position parsing #271

Open
jberg1999 opened this issue Feb 18, 2025 · 0 comments
Open

VCF CDS_position parsing #271

jberg1999 opened this issue Feb 18, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@jberg1999
Copy link

Description of the bug

Hi nf-core team,

I am having issues with running my vcf files through the epitiope prediction pipeline. Specifically I am using VEP annotated vcf files from sarek, and have noticed that the pipeline errors out when the CDS_position from the CSQ field in my vcf is irregular. This has happened when the CDS_position contains a ? and when it is empty. Based on the error log below, I believe that the offending line is line 283 of epaa.py

tpos = int(cds_pos.split("/")[0].split("-")[0]) - 1

This seems to fail when int() can't convert cds_pos.

Thanks for the help!

Command used and terminal output

CONFIG=/michorlab/jacobg/nf-core/nextflow.config
SAMPLESHEET=samplesheet_no_bad_samples.csv
OUTDIR=/michorlab/jacobg/multiomics/epitope/results
GENOME=grch38
TOOLS=netmhcpan-4.1
MINPEPTIDELENGTH=8
MAXPEPTIDELENGTH=11
NETMHCPATH=/michorlab/jacobg/multiomics/epitope/netMHCpan-4.1b.Linux.tar.gz
VERSION=2.3.1

nextflow run nf-core/epitopeprediction -profile singularity -c $CONFIG --input $SAMPLESHEET --outdir $OUTDIR --genome_reference $GENOME --tools $TOOLS  --min_peptide_length $MINPEPTIDELENGTH  --max_peptide_length $MAXPEPTIDELENGTH --netmhcpan_path $NETMHCPATH --fasta_output -r $VERSION 



Error executing process > 'NFCORE_EPITOPEPREDICTION:EPITOPEPREDICTION:EPYTOPE_PEPTIDE_PREDICTION_VAR (26)'

Caused by:
  Process `NFCORE_EPITOPEPREDICTION:EPITOPEPREDICTION:EPYTOPE_PEPTIDE_PREDICTION_VAR (26)` terminated with an error exit status (1)


Command executed:

  # create folder for MHCflurry downloads to avoid permission problems when running pipeline with docker profile and mhcflurry selected
  mkdir -p mhcflurry-data
  export MHCFLURRY_DATA_DIR=./mhcflurry-data
  # specify MHCflurry release for which to download models, need to be updated here as well when MHCflurry will be updated
  export MHCFLURRY_DOWNLOADS_CURRENT_RELEASE=1.4.0
  # Add non-free software to the PATH
  shopt -s nullglob
  IFS=',' read -r -a netmhc_paths_string <<< "/michorlab/jacobg/multiomics/epitope/work/99/48181a933e1ac9db9b830c9f09023e/netmhcpan"
  for p in "${netmhc_paths_string[@]}"; do
          export PATH="$(realpath -s "$p"):$PATH";
      done
  shopt -u nullglob
  
  epaa.py --identifier DDC_vs_DDWB.mutect2.filtered_VEP.ann.chr1         --alleles 'A*03:01;A*24:02;B*07:02;B*14:02;C*08:02;C*07:02'         --tools 'netmhcpan-4.1'         --max_length 11         --min_length 8         --versions versions.csv         --fasta_output --genome_reference 'https://www.ensembl.org' --somatic_mutation DDC_vs_DDWB.mutect2.filtered_VEP.ann.chr1.vcf
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_EPITOPEPREDICTION:EPITOPEPREDICTION:EPYTOPE_PEPTIDE_PREDICTION_VAR":
      python: $(python --version 2>&1 | sed 's/Python //g')
      epytope: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('epytope').version)")
      pandas: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
      pyvcf: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('PyVCF3').version)")
      mhcflurry: $(mhcflurry-predict --version 2>&1 | sed 's/^mhcflurry //; s/ .*$//')
      mhcnuggets: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('mhcnuggets').version)")
  END_VERSIONS

Command exit status:
  1

Command output:
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDWB. Skipping.

Command error:
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PGT not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PGT not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PGT not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  Traceback (most recent call last):
    File "/homes9/jacobg/.nextflow/assets/nf-core/epitopeprediction/bin/epaa.py", line 1310, in 
      __main__()
    File "/homes9/jacobg/.nextflow/assets/nf-core/epitopeprediction/bin/epaa.py", line 1059, in __main__
      variant_list, transcripts, metadata = read_vcf(args.somatic_mutations)
    File "/homes9/jacobg/.nextflow/assets/nf-core/epitopeprediction/bin/epaa.py", line 283, in read_vcf
      tpos = int(cds_pos.split("/")[0].split("-")[0]) - 1
  ValueError: invalid literal for int() with base 10: ''

Work dir:
  /michorlab/jacobg/multiomics/epitope/work/a5/38e34c091b21a3a07a91b6bd59e7e7

Relevant files

nextflow.log

I can't attach the full vcf for data privacy reason so I am sending the header.

vcf_header.txt

System information

nextflow version: version 24.04.4
HPC
slurm executor
singularity
OS: CentOS Linux
nf-core/epitopeprediction 2.3.1

@jberg1999 jberg1999 added the bug Something isn't working label Feb 18, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant