Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

CLIENTS: Send Audio Transcription #381

Conversation

Catalin-Andronie
Copy link
Contributor

Closes #212

@hassanhabib
Copy link
Owner

@Catalin-Andronie post a screenshot of this client running please

@Catalin-Andronie Catalin-Andronie force-pushed the users/catalin-andronie/clients_post_audio_transcription branch 2 times, most recently from ba5d3cb to 45f4bec Compare May 13, 2023 07:57
@Catalin-Andronie
Copy link
Contributor Author

Catalin-Andronie commented May 13, 2023

@Catalin-Andronie post a screenshot of this client running please

@hassanhabib Unfortunately this will not work as expected because of the next RESTFulSense missing features:

  1. MEDIUM FIX: Enable creation of multipart-form with non-string types RESTFulSense#132
  2. FOUNDATIONS: Allow nullable value properties to be skipped or be part on the multipart-form RESTFulSense#130

  1. Without MEDIUM FIX: Enable creation of multipart-form with non-string types RESTFulSense#132 implemented we are forced to mark all AudioTranscriptionRequest properties as string. E.g: Temperature will become a string type instead of a double as it should be.
public class AudioTranscriptionRequest
{
-    public double Temperature { get; set; } = 0.2;
+    public string Temperature { get; set; } = "0.2";
}
  1. Without FOUNDATIONS: Allow nullable value properties to be skipped or be part on the multipart-form RESTFulSense#130 implemented we are forced to set all the AudioTranscriptionRequest properties with some value even if some of them are optional. For the creation of a translation, OpenAI requires only the file and model to be present in the request body and the rest of the options are optional, and if we do so we are getting an exception for the other properties since they are null or have empty values.
var inputAudioTranscription = new AudioTranscription
{
    Request = new AudioTranscriptionRequest
    {
        Content = fileContent,
        FileName = fileName,
        Model = "whisper-1",
-       Prompt = null, // This is an optional value and I should not be forced to set its value.
+       Prompt = "Some prompt...",
-       Language = "", // This will throw an exception since RESTFulSense doesn't accept empty values.
+       Language = "en"
    }
};

AudioTranscription responseAudioTranscription =
    await this.openAIClient.AudioTranscriptions.SendAudioTranscriptionAsync(
        inputAudioTranscription);

@hassanhabib what's your suggestion?

@BrianLParker
Copy link
Collaborator

BrianLParker commented May 19, 2023

Reading the API the file in the request is a string. The file should be previously uploaded using the name (string) for this. RESTFulSense already supports this using the PostContentAsync method. If you look at the example on the API page they are only posting the name of the file "german.m4a". Sorry for the late response been quite ill for the last 4 days.

@Catalin-Andronie
Copy link
Contributor Author

Catalin-Andronie commented May 20, 2023

Reading the API the file in the request is a string. The file should be previously uploaded using the name (string) for this. RESTFulSense already supports this using the PostContentAsync method. If you look at the example on the API page they are only posting the name of the file "german.m4a". Sorry for the late response been quite ill for the last 4 days.

The API only accepts multipart/form-data content-type, which means we cannot use PostContentAsync since that uses "json" content-type (See example request).

curl https://api.openai.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/audio.mp3" \
  -F model="whisper-1"

Actually, the API specifies to post the file itself as a stream of bytes, the example only illustrates what it should be there. Also, besides the file and model the API accepts other properties in the request body like temperature, language, and prompt (the latter 3 are optional).

Currently, we are using PostFormAsync to send the file stream to the server and we are obtaining its transcription, which works like a charm. Even so, we have two issues/features which need to be implemented before moving forward and completing the Audio Transcription. Please read this comment to understand the RESTFulSense features.

@Catalin-Andronie Catalin-Andronie force-pushed the users/catalin-andronie/clients_post_audio_transcription branch from cdca630 to d4a31a3 Compare May 22, 2023 16:12
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLIENTS: Transcript Audio File
3 participants