Fixes TypeError: torch.Size() takes an iterable of 'int' (item 1 is '… #729

Rajathbharadwaj · 2023-07-02T03:37:50Z

Fixes TypeError: torch.Size() takes an iterable of 'int' (item 1 is 'NoneType') Error.

When using Transformer4Rec, whilst creating the tabular_inputs from tr.TabularSequenceFeatures.from_schema, it throws a TypeError. After a bit of inspection, the following changes solved the issue.

Fixes #728

Goals ⚽

Implementation Details 🚧

Testing Details 🔍

…NoneType') Error When using Transformer4Rec, whilst creating the `tabular_inputs` from `tr.TabularSequenceFeatures.from_schema`, it throws an TypeError. After a bit of inspect, the following changes solved the issue.

rapids-bot · 2023-07-02T03:37:56Z

Pull requests from external contributors require approval from a NVIDIA-Merlin organization member with write permissions or greater before CI can begin.

rnyak · 2023-07-03T12:34:46Z

@Rajathbharadwaj hello. thanks for the PR. Can you please first provide a reproducible example with a toy dataset of your error?

Rajathbharadwaj · 2023-07-07T15:12:17Z

Hey @rnyak, definitely.

Following the Advanced NVTabular Workflow

import os
from merlin.datasets.entertainment import get_movielens

input_path = os.environ.get("INPUT_DATA_DIR", os.path.expanduser("~/merlin-framework/movielens/"))
get_movielens(variant="ml-1m", path=input_path); #noqa


from merlin.core.dispatch import get_lib

data = get_lib().read_parquet(f'{input_path}ml-1m/train.parquet').sample(frac=1)

train = data.iloc[:600_000]
valid = data.iloc[600_000:]

movies = get_lib().read_parquet(f'{input_path}ml-1m/movies_converted.parquet')



import nvtabular as nvt
from merlin.schema.tags import Tags

train_ds = nvt.Dataset(train, npartitions=2)
valid_ds = nvt.Dataset(valid)

train_ds, valid_ds
train_ds.shuffle_by_keys('userId')
valid_ds.shuffle_by_keys('userId')

genres = ['movieId'] >> nvt.ops.JoinExternal(movies, on='movieId', columns_ext=['movieId', 'genres'])

genres = genres >> nvt.ops.Categorify(freq_threshold=10)

def rating_to_binary(col):
    return col > 3

binary_rating = ['rating'] >> nvt.ops.LambdaOp(rating_to_binary) >> nvt.ops.Rename(name='binary_rating')

userId = ['userId'] >> nvt.ops.Categorify() >> nvt.ops.AddTags(tags=[Tags.USER_ID, Tags.CATEGORICAL, Tags.USER])
movieId = ['movieId'] >> nvt.ops.Categorify() >> nvt.ops.AddTags(tags=[Tags.ITEM_ID, Tags.CATEGORICAL, Tags.ITEM])
binary_rating = binary_rating >> nvt.ops.AddTags(tags=[Tags.TARGET, Tags.BINARY_CLASSIFICATION])


workflow = nvt.Workflow(userId + movieId + genres + binary_rating)

train_transformed = workflow.fit_transform(train_ds)
valid_transformed = workflow.transform(valid_ds)
valid_transformed.compute().head()
train_transformed.schema

# Issue after running this code

from transformers4rec.torch import TabularSequenceFeatures
tabular_inputs = TabularSequenceFeatures.from_schema(
        train_transformed.schema,
        embedding_dim_default=128,
        max_sequence_length=20,
        d_output=100,
        aggregation="concat",
        masking="clm"
    )

It throws the following error
TypeError: torch.Size() takes an iterable of 'int' (item 1 is 'NoneType') Error.

After a bit of inspection, I found that the parameter max_sequence_length isn't passed to the tabular.py file which makes the value of max_sequence_length to None and hence the torch.Size() returns an error of NoneType at the 1st index and max_sequence_length is getting passed to that.

Rajathbharadwaj closed this Jul 7, 2023

Rajathbharadwaj reopened this Jul 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes TypeError: torch.Size() takes an iterable of 'int' (item 1 is '… #729

Fixes TypeError: torch.Size() takes an iterable of 'int' (item 1 is '… #729

Rajathbharadwaj commented Jul 2, 2023 •

edited

Loading

rapids-bot bot commented Jul 2, 2023

rnyak commented Jul 3, 2023

Rajathbharadwaj commented Jul 7, 2023

Fixes TypeError: torch.Size() takes an iterable of 'int' (item 1 is '… #729

Are you sure you want to change the base?

Fixes TypeError: torch.Size() takes an iterable of 'int' (item 1 is '… #729

Conversation

Rajathbharadwaj commented Jul 2, 2023 • edited Loading

Goals ⚽

Implementation Details 🚧

Testing Details 🔍

rapids-bot bot commented Jul 2, 2023

rnyak commented Jul 3, 2023

Rajathbharadwaj commented Jul 7, 2023

Rajathbharadwaj commented Jul 2, 2023 •

edited

Loading