Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Use Dictionary lookup for supplied IDs to Embedding Operator #148

Merged

Conversation

oliverholworthy
Copy link
Member

  • Use Dictionary lookup for supplied IDs to Embedding Operator.
    • Improving the speed of index lookups for larger sets of embeddings.
  • Adds unknown_value option enable unknown IDs to set a default the value for the embedding returned for ids that are not found in the set of pre-trained embeddings
  • Changes the use of the mmap parameter to make it optional when passing a file. Currently if passing a file without mmap=True, we get an unrelated error.

Example

Using 10 million IDs. The operator transform runs in roughly 500-600 milliseonds. This scales proportionaly with the number of IDs.

After this change the operator transform runs in 50-60 microseconds with 10 million IDs.

@oliverholworthy oliverholworthy added the enhancement New feature or request label May 12, 2023
@oliverholworthy oliverholworthy added this to the Merlin 23.05 milestone May 12, 2023
@oliverholworthy oliverholworthy self-assigned this May 12, 2023
@oliverholworthy oliverholworthy merged commit ec9bedf into NVIDIA-Merlin:main May 12, 2023
@oliverholworthy oliverholworthy deleted the embeddings-faster-lookup branch May 12, 2023 20:06
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants