Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Is there a way to deactivate the overlap detection so bakta does not filter my input proteins? #295

Closed
Daniel-Tichy opened this issue Jun 19, 2024 · 5 comments
Assignees
Labels
Milestone

Comments

@Daniel-Tichy
Copy link

Daniel-Tichy commented Jun 19, 2024

  • The issue is related to the user-provided proteins feature and its associated issues.

  • I am trying to use bakta to perform annotation on a phage predicted protein file that used Phanotate. I was expecting an annotation to every protein of my input file but it seems that overlapped proteins are being filtered by bakta.

-I would like to deactivate the overlap detection so bakta does not filter the previously predicted proteins that I am using as input.

Example: this is my input gbk for bakta.

 CDS             3417..3809
                 /ID="WARQSXNU_CDS_9"
                 /phrog="786"
                 /top_hit="p65745 VI_07030"
                 /locus_tag="WARQSXNU_9"
                 /function="unknown function"
                 /product="hypothetical protein"
                 /source="PHANOTATE"
                 /score="-22.41946661155013"
                 /phase="0"
                 /translation="MAAPTPEELVSQMASRGMTITTTDASGILCLVASISECLELNYPN
                 DECRQNAIMLWASILISANTAGRYVTSQSAPSGASQSFAYGSKPWVALYNQMKLLDSAG
                 CTGDLVEDPDGSGKPWFAVVRGSKCK"
 CDS             3806..4147
                 /ID="WARQSXNU_CDS_10"
                 /phrog="797"
                 /top_hit="p299466 VI_10274"
                 /locus_tag="WARQSXNU_10"
                 /function="unknown function"
                 /product="hypothetical protein"
                 /source="PHANOTATE"
                 /score="-111.69024253224252"
                 /phase="0"
                 /translation="MTSLARFSYTQPCTIWHKSGTDKYGKPTFDAPVSIMCDYGFNDDV
                 STDAKGNEIVQKNTFWTEYTGAKVGDYIMIGTMIEADPLVAGANQILNVINYGNTFQRS
                 EPPDFALVT"
 CDS             4147..4545
                 /ID="WARQSXNU_CDS_11"
                 /phrog="No_PHROG"
                 /top_hit="No_PHROG"
                 /locus_tag="WARQSXNU_11"
                 /function="unknown function"
                 /product="hypothetical protein"
                 /source="PHANOTATE"
                 /score="-52.42604964159676"
                 /phase="0"
                 /translation="MPAKLRGVRKAVERTSQIVDEIIATKAVRALKSATYIIRTESATL
                 TPIDTSTLINSQFDTVEVSGTRITGKVGYSAKYALYVHNASGKLAGKPRSNGNGTYWSP
                 GGEPQFLTKAAQRTKDLVDGVIKKEMKL"

I parse it and input it in the following format to bakta.

WARQSXNU_9 ~~~hypothetical protein~~~
MAAPTPEELVSQMASRGMTITTTDASGILCLVASISECLELNYPNDECRQNAIMLWASIL
ISANTAGRYVTSQSAPSGASQSFAYGSKPWVALYNQMKLLDSAGCTGDLVEDPDGSGKPW
FAVVRGSKCK
WARQSXNU_10 ~~~hypothetical protein~~~
MTSLARFSYTQPCTIWHKSGTDKYGKPTFDAPVSIMCDYGFNDDVSTDAKGNEIVQKNTF
WTEYTGAKVGDYIMIGTMIEADPLVAGANQILNVINYGNTFQRSEPPDFALVT
WARQSXNU_11 ~~~hypothetical protein~~~
MPAKLRGVRKAVERTSQIVDEIIATKAVRALKSATYIIRTESATLTPIDTSTLINSQFDT
VEVSGTRITGKVGYSAKYALYVHNASGKLAGKPRSNGNGTYWSPGGEPQFLTKAAQRTKD
LVDGVIKKEMKL

But I get this output, the protein for WARQSXNU_10 is missing probably because of the overlap in the genome.

 gene            complement(40007..40405)
                 /locus_tag="MKOBIG_00315"
 CDS             complement(40007..40405)
                 /db_xref="SO:0001217"
                 /db_xref="UniRef:UniRef50_W7P0V4"
                 /db_xref="UniRef:UniRef90_A0A1B1W263"
                 /db_xref="UserProtein:WARQSXNU_11"
                 /product="hypothetical protein"
                 /locus_tag="MKOBIG_00315"
                 /protein_id="gnl|Bakta|MKOBIG_00315"
                 /translation="MPAKLRGVRKAVERTSQIVDEIIATKAVRALKSATYIIRTESATL
                 TPIDTSTLINSQFDTVEVSGTRITGKVGYSAKYALYVHNASGKLAGKPRSNGNGTYWSP
                 GGEPQFLTKAAQRTKDLVDGVIKKEMKL"
                 /codon_start=1
                 /transl_table=11
                 /inference="ab initio prediction:Prodigal:2.6"
                 /inference="similar to AA
                 sequence:UniRef:UniRef90_A0A1B1W263"
 gene            complement(40743..41135)
                 /locus_tag="MKOBIG_00320"
 CDS             complement(40743..41135)
                 /db_xref="SO:0001217"
                 /db_xref="UniRef:UniRef50_A0A173GBZ4"
                 /db_xref="UniRef:UniRef90_A0A1B1W265"
                 /db_xref="UserProtein:WARQSXNU_9"
                 /product="hypothetical protein"
                 /locus_tag="MKOBIG_00320"
                 /protein_id="gnl|Bakta|MKOBIG_00320"
                 /translation="MAAPTPEELVSQMASRGMTITTTDASGILCLVASISECLELNYPN
                 DECRQNAIMLWASILISANTAGRYVTSQSAPSGASQSFAYGSKPWVALYNQMKLLDSAG
                 CTGDLVEDPDGSGKPWFAVVRGSKCK"
                 /codon_start=1
                 /transl_table=11
                 /inference="ab initio prediction:Prodigal:2.6"
                 /inference="similar to AA
                 sequence:UniRef:UniRef90_A0A1B1W265"

I am currently running bakta with this line within a docker.
bakta --db $bakta_db/ --protein $faa_input_bakta --skip-trna --skip-tmrna --skip-rrna --skip-ncrna --skip-ncrna-region --skip-crispr --skip-pseudo --skip-gap --skip-ori --skip-plot --output ${assembly_input_bakta.simpleName}_bakta/ --threads ${params.threads} $assembly_input_bakta

@Daniel-Tichy Daniel-Tichy added the enhancement New feature or request label Jun 19, 2024
@oschwengers
Copy link
Owner

Hi, thanks for reaching out. To make sure that I correctly understand what you're finally trying to achieve: you would like to annotate a phage genome sequence with Bakta using a user-provided proteins file with functional annotations from Phanotate? Is this correct?

@Daniel-Tichy
Copy link
Author

Hi! yes, I want to perform the bakta annotation over a user-provided proteins file with functional annotations from Phanotate

@oschwengers
Copy link
Owner

Hmm, in principle, you can do this. However, Bakta was designed to annotate bacterial genomes, hence the overlap filters. I could add an option to deactivate all overlap filters in the next release. But I cannot make any promises when this will be. Meanwhile, you could try pharokka?

@oschwengers oschwengers self-assigned this Sep 24, 2024
@oschwengers oschwengers added this to the v1.10.0 milestone Oct 10, 2024
oschwengers added a commit that referenced this issue Oct 10, 2024
@oschwengers
Copy link
Owner

Hey @Daniel-Tichy , I just added a new --skip-filter option to Bakta which is now available in the main branch, and will be public with the upcoming v1.10.0, soon.

I hope this fits your needs in this case. I'll close this for now. If there are any further comments, ideas, suggestions, please do not hesitate to re-open this (or a new one). Thanks again an best regards!

@oschwengers oschwengers added feature and removed enhancement New feature or request labels Oct 10, 2024
@Daniel-Tichy
Copy link
Author

Thank you!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants