-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Adjust tag propagation in EmbeddingOperator
#146
Adjust tag propagation in EmbeddingOperator
#146
Conversation
This limits the propagation of tags from the input id column to the output embedding column.
Tags.CONTEXT, | ||
Tags.SESSION, | ||
] | ||
for tag in propagated_tags: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that this is going to be updated with an exclusion of tags based on the TAG_COLLISIONS[Tags.EMBEDDING]
in place of the allow-list propgation here. That way it will automaticaly propage any valid tags? including custom tags added by a user to their dataset (since tags can also be strings)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first we thought so, but we then realized that it wasn't safe to assume that any tag should automatically be propagated unless we knew specifically that the embedding op wouldn't change the applicability of the tag, so we switched tactics from a deny-list to an allow-list. The deny-list changes in Merlin Core are still important though, even though they're not really used here, because you can manually apply the EMBEDDING tag, which ought to remove any other tags that are not compatible with EMBEDDING.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
realized that it wasn't safe to assume that any tag should automatically be propagated unless we knew specifically that the embedding op wouldn't change the applicability of the tag
For the set of tags we provide in the Tags Enum, is there an example you have in mind where the deny list in TAG_COLLISIONS
wouldn't work? (propagate a tag that we wouldn't want to)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several that are questionable, like TEXT
for example. You can definitely have a text embedding, but the embedding itself isn't text. Either way could work (propagating the TEXT
tag or not), but the semantics of the tag aren't entirely clear, so we weren't sure which side of the line that would fall on.
We've made requested changes re: tag propagation in the past (e.g. "Categorify should propagate tags from the source column") and then gotten bugs filed against the result because it propagated "the wrong tags," so this was a "let's try to be cautious about what we include" decision. That said, I don't have a strong opinion here; I'm just trying to adjust the operator to meet the specific cases that were called out, and either way is fine with me as long as it meets the need.
This limits the propagation of tags from the input id column to the output embedding column. Depends on NVIDIA-Merlin/core#316.