Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

ProteinLanguage error handling #139

Closed
jannisborn opened this issue Jul 5, 2021 · 0 comments · Fixed by #146
Closed

ProteinLanguage error handling #139

jannisborn opened this issue Jul 5, 2021 · 0 comments · Fixed by #146
Labels
invalid This doesn't seem right

Comments

@jannisborn
Copy link
Member

Currently, proteinlanguage raises an error if it encounters an unknown token at runtime:

torch.tensor(token_indexes, dtype=self.dtype, device=self.device)
TypeError: an integer is required (got type str)
  • The error message is cryptic
  • It should be filled with an unknown token and a warning should be raised
  • if iterate_dataset is True, this issue should be detected at object construction (not the case currently)
@jannisborn jannisborn added the invalid This doesn't seem right label Jul 5, 2021
jannisborn added a commit that referenced this issue Jan 3, 2022
* feat: crawlers now convert unicode to ascii [skip ci]

* fix: protein-language unknown token handling (fixes ##139)

* feat: protein_sequence dataset detects unknown tokens at construction time if iterate_dataset is passed

* fix: codiga style

* refactor: selfies>=2 version bump
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant