Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Dataset cloning wraps the label #35

Closed
chschroeder opened this issue Jun 20, 2023 · 0 comments
Closed

Dataset cloning wraps the label #35

chschroeder opened this issue Jun 20, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@chschroeder
Copy link
Contributor

chschroeder commented Jun 20, 2023

Bug description

Selecting a sub(data)set and then cloning a dataset wraps the labels in a superfluous "ndarray()". This affects PytorchTextClassificationDataset and TransformersDataset.

Edit:
I noticed this because clf.predict() on the cloned dataset raised TypeError: len() of unsized object.

Steps to reproduce

Example for TransformersDataset:

import unittest
from tests.utils.datasets import random_transformer_dataset


class CloneBugTest(unittest.TestCase):

    def test_asd(self):
        dataset = random_transformer_dataset(num_samples=20,
                                             multi_label=False,
                                             num_classes=3)
        indices = [0, 1]
  
        dataset_cloned = dataset[indices].clone()

        first_label = dataset.data[0][TransformersDataset.INDEX_LABEL]
        first_label_cloned = dataset_cloned.data[0][TransformersDataset.INDEX_LABEL]

        print(first_label, str(first_label), repr(first_label))
        print(first_label_cloned, str(first_label_cloned), repr(first_label_cloned))

Output:

0 0 0
0 0 array(0)

Expected behavior

Expected Output:

0 0 0
0 0 0

Environment:

Python version: 3.8
small-text version: 1.3.0
small-text integrations (e.g., transformers): transformers
PyTorch version (if applicable): -

Installation (pip, conda, or from source): pip
CUDA version (if applicable): -

Additional information

--

@chschroeder chschroeder added the bug Something isn't working label Jun 20, 2023
@chschroeder chschroeder added this to the small-text-2.0.0 milestone Jun 20, 2023
@chschroeder chschroeder changed the title Dataset cloning after subselection wraps the label Dataset cloning wraps the label Jun 24, 2023
chschroeder added a commit that referenced this issue Jun 24, 2023
Signed-off-by: Christopher Schröder <chschroeder@users.noreply.github.com>
chschroeder added a commit that referenced this issue Jun 25, 2023
Signed-off-by: Christopher Schröder <chschroeder@users.noreply.github.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant