ProteinLanguage error handling #139

jannisborn · 2021-07-05T14:29:32Z

Currently, proteinlanguage raises an error if it encounters an unknown token at runtime:

torch.tensor(token_indexes, dtype=self.dtype, device=self.device)
TypeError: an integer is required (got type str)

The error message is cryptic
It should be filled with an unknown token and a warning should be raised
if iterate_dataset is True, this issue should be detected at object construction (not the case currently)

The text was updated successfully, but these errors were encountered:

* feat: crawlers now convert unicode to ascii [skip ci] * fix: protein-language unknown token handling (fixes ##139) * feat: protein_sequence dataset detects unknown tokens at construction time if iterate_dataset is passed * fix: codiga style * refactor: selfies>=2 version bump

jannisborn added the invalid This doesn't seem right label Jul 5, 2021

jannisborn added a commit that referenced this issue Jan 3, 2022

fix: protein-language unknown token handling (fixes ##139)

e387bac

jannisborn mentioned this issue Jan 3, 2022

Fix proteinlanguage handling #146

Merged

jannisborn closed this as completed in #146 Jan 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ProteinLanguage error handling #139

ProteinLanguage error handling #139

jannisborn commented Jul 5, 2021

ProteinLanguage error handling #139

ProteinLanguage error handling #139

Comments

jannisborn commented Jul 5, 2021