Remove sparse tensor output type for list features #103
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Remove sparse tensor output type for list features
Motivation
The value count attributes of columns in
dataset.schema
currently controls the output type of list columns.With the addition of shape in the schema NVIDIA-Merlin/Merlin#813 we're going to start seeing value counts specified more. This will result in unexpected output type's if previously a value count was not specified.
We also currently have a possibility of output type sparse tensor which doesn't have a clear use-case and appears to be an implementation detail of padding ragged columns to dense.
Current
value_count.max
specified andis_ragged=False
value_count.max
specified andis_ragged=True
value_count.max
not specified andis_ragged=True
After - With this PR
Always returns this for all list columns. value count does not influence output type.
Planning a follow-up / independent change in #97 to return a dense tensor if the schema specifies that a column is of fixed size (
is_ragged=False
).