Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Stochastic build failure due to lack of gene coverage #75

Open
joverlee521 opened this issue Dec 31, 2024 · 3 comments
Open

Stochastic build failure due to lack of gene coverage #75

joverlee521 opened this issue Dec 31, 2024 · 3 comments

Comments

@joverlee521
Copy link
Contributor

Error from iqtree during augur tree:

[batch] [2024-12-31T19:12:26+00:00]   ERROR: Some sequences (see above) are problematic, please check your alignment again

Log file results/a/F/all-time/F_aligned-delim.iqtree.log contains warning:

WARNING: Sequence KX510248 contains only gaps or missing data
@joverlee521
Copy link
Contributor Author

Checked KX510248 in the input alignment file results/a/F/all-time/F_aligned.fasta and it's only Ns.

@joverlee521
Copy link
Contributor Author

Looking at the GenBank record for KX510248, it has a gap 4946..7456, so it's missing the entire F gene (5697-7421). I would think the sequence would get filtered out by the 0.3 min_coverage filter. However, looking at the metadata.tsv, the F_coverage for KX510248 is 1.0, so we must not be taking Ns into account when calculating coverage.

@joverlee521
Copy link
Contributor Author

Ah, the gene coverage columns are added during ingest by extend_metadata.py, which compares for the alignment start and end to the gene coordinates, but does not account for gaps/Ns. I remember now that this is why dengue took the different route of calculating gene coverage from the Nextclade's translation output.

I'm going to leave this as-is for now as it's only a stochastic failure that only happens when certain sequences are included in the build, but in the long term, we should revisit having Nextclade calculate the cds coverage.

@joverlee521 joverlee521 changed the title Automated rebuild RSV analysis failure Stochastic build failure due to lack of gene coverage Dec 31, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant