Creating a TextDataset with None items results in TypeError during SetFit training #73

chschroeder · 2025-01-21T20:07:03Z

Feature description

Creating a TextDataset that contains None item should be prevented.

Motivation

If you create such a dataset, which does not make sense but is currently possible, during SetFit training you end up with a strange error:

<...>
    encodings = self._tokenizer.encode_batch(
TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

Not really a bug, but we can check for this and provide an error that is more helpful.

Additional comments

Reported by @eisioriginal

The text was updated successfully, but these errors were encountered:

Signed-off-by: Christopher Schröder <chschroeder@users.noreply.github.com>

chschroeder added the feature request New feature request label Jan 21, 2025

chschroeder added a commit that referenced this issue Jan 21, 2025

Prevent TextDataset objects from containing None (#73)

1d38af8

Signed-off-by: Christopher Schröder <chschroeder@users.noreply.github.com>

chschroeder closed this as completed Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating a TextDataset with None items results in TypeError during SetFit training #73

Creating a TextDataset with None items results in TypeError during SetFit training #73

chschroeder commented Jan 21, 2025

Creating a TextDataset with None items results in TypeError during SetFit training #73

Creating a TextDataset with None items results in TypeError during SetFit training #73

Comments

chschroeder commented Jan 21, 2025

Feature description

Motivation

Additional comments