Dataset cloning wraps the label #35

chschroeder · 2023-06-20T20:45:54Z

Bug description

Selecting a sub(data)set and then cloning a dataset wraps the labels in a superfluous "ndarray()". This affects PytorchTextClassificationDataset and TransformersDataset.

Edit:
I noticed this because clf.predict() on the cloned dataset raised TypeError: len() of unsized object.

Steps to reproduce

Example for TransformersDataset:

import unittest
from tests.utils.datasets import random_transformer_dataset


class CloneBugTest(unittest.TestCase):

    def test_asd(self):
        dataset = random_transformer_dataset(num_samples=20,
                                             multi_label=False,
                                             num_classes=3)
        indices = [0, 1]
  
        dataset_cloned = dataset[indices].clone()

        first_label = dataset.data[0][TransformersDataset.INDEX_LABEL]
        first_label_cloned = dataset_cloned.data[0][TransformersDataset.INDEX_LABEL]

        print(first_label, str(first_label), repr(first_label))
        print(first_label_cloned, str(first_label_cloned), repr(first_label_cloned))

Output:

0 0 0
0 0 array(0)

Expected behavior

Expected Output:

0 0 0
0 0 0

Environment:

Python version: 3.8
small-text version: 1.3.0
small-text integrations (e.g., transformers): transformers
PyTorch version (if applicable): -

Installation (pip, conda, or from source): pip
CUDA version (if applicable): -

Additional information

--

The text was updated successfully, but these errors were encountered:

Signed-off-by: Christopher Schröder <chschroeder@users.noreply.github.com>

chschroeder added the bug Something isn't working label Jun 20, 2023

chschroeder added this to the small-text-2.0.0 milestone Jun 20, 2023

chschroeder changed the title ~~Dataset cloning after subselection wraps the label~~ Dataset cloning wraps the label Jun 24, 2023

chschroeder added a commit that referenced this issue Jun 24, 2023

Fix clunnecessary label wrapping when cloning (#35)

77d290e

Signed-off-by: Christopher Schröder <chschroeder@users.noreply.github.com>

chschroeder closed this as completed Jun 24, 2023

chschroeder added a commit that referenced this issue Jun 25, 2023

Fix unnecessary label wrapping when cloning (#35)

92c0240

Signed-off-by: Christopher Schröder <chschroeder@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset cloning wraps the label #35

Dataset cloning wraps the label #35

chschroeder commented Jun 20, 2023 •

edited

Loading

Dataset cloning wraps the label #35

Dataset cloning wraps the label #35

Comments

chschroeder commented Jun 20, 2023 • edited Loading

Bug description

Steps to reproduce

Expected behavior

Environment:

Additional information

chschroeder commented Jun 20, 2023 •

edited

Loading