Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Creating a TextDataset with None items results in TypeError during SetFit training #73

Closed
chschroeder opened this issue Jan 21, 2025 · 0 comments
Labels
feature request New feature request

Comments

@chschroeder
Copy link
Contributor

Feature description

Creating a TextDataset that contains None item should be prevented.

Motivation

If you create such a dataset, which does not make sense but is currently possible, during SetFit training you end up with a strange error:

<...>
    encodings = self._tokenizer.encode_batch(
TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

Not really a bug, but we can check for this and provide an error that is more helpful.

Additional comments

Reported by @eisioriginal

@chschroeder chschroeder added the feature request New feature request label Jan 21, 2025
chschroeder added a commit that referenced this issue Jan 21, 2025
Signed-off-by: Christopher Schröder <chschroeder@users.noreply.github.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature request New feature request
Projects
None yet
Development

No branches or pull requests

1 participant