Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Adjust tag propagation in EmbeddingOperator #146

Merged

Conversation

karlhigley
Copy link
Contributor

This limits the propagation of tags from the input id column to the output embedding column. Depends on NVIDIA-Merlin/core#316.

This limits the propagation of tags from the input id column to the output embedding column.
@karlhigley karlhigley added the bug Something isn't working label May 10, 2023
@karlhigley karlhigley added this to the Merlin 23.05 milestone May 10, 2023
@karlhigley karlhigley self-assigned this May 10, 2023
Tags.CONTEXT,
Tags.SESSION,
]
for tag in propagated_tags:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that this is going to be updated with an exclusion of tags based on the TAG_COLLISIONS[Tags.EMBEDDING] in place of the allow-list propgation here. That way it will automaticaly propage any valid tags? including custom tags added by a user to their dataset (since tags can also be strings)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first we thought so, but we then realized that it wasn't safe to assume that any tag should automatically be propagated unless we knew specifically that the embedding op wouldn't change the applicability of the tag, so we switched tactics from a deny-list to an allow-list. The deny-list changes in Merlin Core are still important though, even though they're not really used here, because you can manually apply the EMBEDDING tag, which ought to remove any other tags that are not compatible with EMBEDDING.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

realized that it wasn't safe to assume that any tag should automatically be propagated unless we knew specifically that the embedding op wouldn't change the applicability of the tag

For the set of tags we provide in the Tags Enum, is there an example you have in mind where the deny list in TAG_COLLISIONS wouldn't work? (propagate a tag that we wouldn't want to)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several that are questionable, like TEXT for example. You can definitely have a text embedding, but the embedding itself isn't text. Either way could work (propagating the TEXT tag or not), but the semantics of the tag aren't entirely clear, so we weren't sure which side of the line that would fall on.

We've made requested changes re: tag propagation in the past (e.g. "Categorify should propagate tags from the source column") and then gotten bugs filed against the result because it propagated "the wrong tags," so this was a "let's try to be cautious about what we include" decision. That said, I don't have a strong opinion here; I'm just trying to adjust the operator to meet the specific cases that were called out, and either way is fine with me as long as it meets the need.

@karlhigley karlhigley merged commit dbe0ade into NVIDIA-Merlin:main May 11, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants