Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[WIP] add rules for protein mapping #159

Open
wants to merge 5 commits into
base: latest
Choose a base branch
from
Open

Conversation

bluegenes
Copy link
Member

@bluegenes bluegenes commented Feb 10, 2022

This PR introduces rules to allow mapping nucleotide reads to protein references using Paladin.

not functional yet.

Main questions at this point:

  • Do we want to just do this when the user selects protein sourmash? Or do we want to enabling running both protein and nucleotide sourmash within the same grist output folder?
    • mostly what I'm getting at here is whether or not we want to include the moltype in the gather output filename, because we expect folks might want to run both moltypes. I know I want to run both, but I'm not sure if this is a general use case.
  • Do we want to store proteomes in the same folder as genbank genomes? Or in a separate folder, e.g. proteomes?

To do:

  • make checkpoints --> download proteomes work
  • new checkpoint to prodigal proteome if not downloadable
  • try BBMerge, fall back to PEAR read merging if don't like
  • add tests
  • Add reporting and visualization

@taylorreiter
Copy link
Member

hot takes

  • Do we want to just do this when the user selects protein sourmash? Or do we want to run both protein and nucleotide sourmash at the same time (e.g. have different gather checkpoints)?

I think when a user selects protein sourmash would be a good default, and perhaps a good-enough-for-now. It could be cool to have a --protein flag, so that when a user uses nucleotide sourmash, they can still map in protein space. But I'm not sure how much snakemake work this is vs. the amount of gain for enabled use cases.

  • Do we want to store proteomes in the same folder as genbank genomes? Or in a separate folder, e.g. proteomes?

I would prefer a proteomes directory personally i think. I'm willing to be persuaded differently though :)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants