-
Notifications
You must be signed in to change notification settings - Fork 11
Add OpenAI Embeddings Primitive #251
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Conversation
Adds a primitive for natural language logical types that uses the OpenAI Embeddings API to calculate embeddings features. The model to use is configurable, but text-embedding-ada-002 is used by default.
|
||
def can_fit_in_batch(tokens) -> bool: | ||
return ( | ||
len(elements_in_batch) < 2048 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 2048 the maximum number of elements per batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, here's the limit in the openai client: https://github.com/openai/openai-python/blob/main/openai/embeddings_utils.py#L43
|
||
# can this element fit in the batch? | ||
if can_fit_in_batch(next_tokens): | ||
# can't fit -- construct a request with existing elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I am misunderstanding this, but does this block cover the case where it can fit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Fixed. I need to add tests for all of this to catch stuff like this 😅
Co-authored-by: Shripad Badithe <60528327+sbadithe@users.noreply.github.com>
Adds a primitive for natural language logical types that uses the OpenAI Embeddings API to calculate embeddings features.
The model to use is configurable, but
text-embedding-ada-002
is used by default.