Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Not possible to create a post request with more than 268 strings for create_embeddings()? #48

Closed
atantos opened this issue Aug 18, 2023 · 2 comments

Comments

@atantos
Copy link

atantos commented Aug 18, 2023

Hi there.

It seems there is a time or string limit for doing post requests? Although I am able to do a one time request for 1000 strings out of the overview column in R, I cannot do more than 268 right now. Is it an issue with the package or am I missing something?

Thanks!

using CSV, DataFrames, OpenAI
horror_movies = CSV.read(Downloads.download("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-01/horror_movies.csv"), DataFrame);

r = create_embeddings(
        ENV["OPENAI_API_KEY"],
        horror_movies.overview[1:268],
        "text-embedding-ada-002"
    )

Here is the error message I get:

{
  "error": {
    "message": "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

Although horror_movies.overview is a string vector..

UPDATE I: I try different vector sizes and it seems there is no hard upper bound for the string vector size. I just managed to get 700 string of horror_movies.overview with horror_movies.overview[1:700]. Is there something that we as users should know or is it simply random luck related to the traffic limits that their server puts?
UPDATE II: However, in R with the following code written by Julia Silge it works every single time for all the 1000 overview texts:

library(tidyverse)

set.seed(123)
horror_movies <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-01/horror_movies.csv') %>%
  filter(!is.na(overview), original_language == "en") %>%
  slice_sample(n = 1000)

library(httr)
embeddings_url <- "https://api.openai.com/v1/embeddings"
auth <- add_headers(Authorization = paste("Bearer", "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"))
body <- list(model = "text-embedding-ada-002", input = horror_movies$overview)

resp <- POST(
  embeddings_url,
  auth,
  body = body,
  encode = "json"
)

embeddings <- content(resp, as = "text", encoding = "UTF-8") %>%
  jsonlite::fromJSON(flatten = TRUE) %>%
  pluck("data", "embedding")
@atantos atantos closed this as completed Aug 18, 2023
@atantos atantos reopened this Aug 18, 2023
@algunion
Copy link
Contributor

algunion commented Aug 19, 2023

Please check out the answer here.

@cpfiffer
Copy link
Collaborator

cpfiffer commented Dec 1, 2023

Closing as this is resolved in the discourse post linked above.

@cpfiffer cpfiffer closed this as completed Dec 1, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants