Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fixes TypeError: torch.Size() takes an iterable of 'int' (item 1 is '… #729

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Rajathbharadwaj
Copy link

@Rajathbharadwaj Rajathbharadwaj commented Jul 2, 2023

Fixes TypeError: torch.Size() takes an iterable of 'int' (item 1 is 'NoneType') Error.

When using Transformer4Rec, whilst creating the tabular_inputs from tr.TabularSequenceFeatures.from_schema, it throws a TypeError. After a bit of inspection, the following changes solved the issue.

Fixes #728

Goals ⚽

Implementation Details 🚧

Testing Details 🔍

…NoneType') Error

When using Transformer4Rec, whilst creating the `tabular_inputs` from `tr.TabularSequenceFeatures.from_schema`, it throws an TypeError. After a bit of inspect, the following changes solved the issue.
@rapids-bot
Copy link

rapids-bot bot commented Jul 2, 2023

Pull requests from external contributors require approval from a NVIDIA-Merlin organization member with write permissions or greater before CI can begin.

@rnyak
Copy link
Contributor

rnyak commented Jul 3, 2023

@Rajathbharadwaj hello. thanks for the PR. Can you please first provide a reproducible example with a toy dataset of your error?

@Rajathbharadwaj
Copy link
Author

Hey @rnyak, definitely.

Following the Advanced NVTabular Workflow

import os
from merlin.datasets.entertainment import get_movielens

input_path = os.environ.get("INPUT_DATA_DIR", os.path.expanduser("~/merlin-framework/movielens/"))
get_movielens(variant="ml-1m", path=input_path); #noqa


from merlin.core.dispatch import get_lib

data = get_lib().read_parquet(f'{input_path}ml-1m/train.parquet').sample(frac=1)

train = data.iloc[:600_000]
valid = data.iloc[600_000:]

movies = get_lib().read_parquet(f'{input_path}ml-1m/movies_converted.parquet')



import nvtabular as nvt
from merlin.schema.tags import Tags

train_ds = nvt.Dataset(train, npartitions=2)
valid_ds = nvt.Dataset(valid)

train_ds, valid_ds
train_ds.shuffle_by_keys('userId')
valid_ds.shuffle_by_keys('userId')

genres = ['movieId'] >> nvt.ops.JoinExternal(movies, on='movieId', columns_ext=['movieId', 'genres'])

genres = genres >> nvt.ops.Categorify(freq_threshold=10)

def rating_to_binary(col):
    return col > 3

binary_rating = ['rating'] >> nvt.ops.LambdaOp(rating_to_binary) >> nvt.ops.Rename(name='binary_rating')

userId = ['userId'] >> nvt.ops.Categorify() >> nvt.ops.AddTags(tags=[Tags.USER_ID, Tags.CATEGORICAL, Tags.USER])
movieId = ['movieId'] >> nvt.ops.Categorify() >> nvt.ops.AddTags(tags=[Tags.ITEM_ID, Tags.CATEGORICAL, Tags.ITEM])
binary_rating = binary_rating >> nvt.ops.AddTags(tags=[Tags.TARGET, Tags.BINARY_CLASSIFICATION])


workflow = nvt.Workflow(userId + movieId + genres + binary_rating)

train_transformed = workflow.fit_transform(train_ds)
valid_transformed = workflow.transform(valid_ds)
valid_transformed.compute().head()
train_transformed.schema

# Issue after running this code

from transformers4rec.torch import TabularSequenceFeatures
tabular_inputs = TabularSequenceFeatures.from_schema(
        train_transformed.schema,
        embedding_dim_default=128,
        max_sequence_length=20,
        d_output=100,
        aggregation="concat",
        masking="clm"
    )

It throws the following error
TypeError: torch.Size() takes an iterable of 'int' (item 1 is 'NoneType') Error.

After a bit of inspection, I found that the parameter max_sequence_length isn't passed to the tabular.py file which makes the value of max_sequence_length to None and hence the torch.Size() returns an error of NoneType at the 1st index and max_sequence_length is getting passed to that.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[QST] error related to schema
2 participants