Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

epitopeprediction using mhcflurry vs running directly mhcflurry #263

Open
gianfilippo opened this issue Jan 28, 2025 · 2 comments
Open

epitopeprediction using mhcflurry vs running directly mhcflurry #263

gianfilippo opened this issue Jan 28, 2025 · 2 comments

Comments

@gianfilippo
Copy link

Hi,

I ran your pipeline using MHCflurry and MHCflurry directly, on the same data
In the case of MHCflurry I used mhcflurry-predict and as input I had alleles from hlatyping nextflow pipeline (RNA) and a FASTA file from PrecisionProDB (using my VCFs as input)

How do you generate peptides from variants (or proteins) ?

I am not very familiar with this kind of analysis, and I am trying to understand all the steps involved.

Thanks

@gianfilippo
Copy link
Author

UPDATE:
I reran the NEXTFLOW pipeline, this time using the .pergeno.protein_changed.fa files from PrecisionProDB as input, instead of the VCF files. I have an average of 200 sequences in the fasta files.
This way the input data is the same between the NEXTFLOW pipeline and MHCflurry, and the tool specific threshold are also the same, 500.
I see about one order of magnitude more prediction with the epitopeprediction than with MHCflurry in 6 of my 8 samples.
For two of the samples there are no predictions.
From what I can see, starting with protein fasta files as input results in an extra folder, generated_peptides, with peptide predictions and the number of predicted peptide is very large, of course, consistently with the final predictions.

I am a bit puzzled at this point, since if I start with VCFs, I end up with 2000-3000 predictions per sample, while if I start with about 200 changed proteins, I end up with about 40000-50000 predictions.

Could you please help me understand what I am clearly missing ?

Thanks

@jonasscheid
Copy link
Contributor

Hi!
Apologies for the late response. I'm not quite familiar with PrecisionProDB, but in epitopeprediction pipeline you can also write out in silico mutated proteins based on vcf files by adding the flag --fasta_output. I think that is the easiest way to compare vs the output of PrecisionProDB. Let me know if I can assist further

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants