OpenAI Vision and TTS #1253

mudler · 2023-11-06T18:36:46Z

https://openai.com/blog/new-models-and-developer-products-announced-at-devday

OpenAI recently announced Turbo vision and text to speech API. This card is to create an API equivalent for the new functionalities

New GPT-4 Turbo Vision API -> LLaVA https://platform.openai.com/docs/guides/vision
JSON output in the object type
call multiple functions in one message #1275
New text-to-speech API #1276

For Turbo vision we need first #1054

localai-bot · 2023-11-07T12:03:50Z

⚠️⚠️⚠️⚠️⚠️

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful 😸 and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me,
I don't support (yet) replying!

⚠️⚠️⚠️⚠️⚠️

LocalAI already supports text generation with GPTs using llama.cpp and other backends (such as rwkv.cpp). For the new GPT-4 Turbo Vision API, we can add a new endpoint to LocalAI that uses the LLaVA library. We will need to install the LLaVA library and set up the backend in the codebase.
For the text-to-speech API, we can use the OpenAI functions provided by LocalAI to generate audio from text. We can add a new endpoint to LocalAI that takes text input and returns an audio file of the generated speech. We will need to set up the backend in the codebase and train the model on a dataset of text-to-speech pairs.
To get started, we can create a feature request for each of these features on the LocalAI GitHub issue tracker and assign them to the relevant team members.

Sources:

mudler · 2023-11-10T17:38:26Z

GPT Vision support API is introduced as part of #1254, along with JSON mode (https://platform.openai.com/docs/guides/text-generation/json-mode)

mudler · 2023-11-10T18:33:11Z

On TTS: By luck I think we are already compatible, but I've didn't tried myself the client

mudler · 2023-11-11T13:18:19Z

Closing as the big portion of this is done, created follow-ups for what's left

mudler added the enhancement New feature or request label Nov 6, 2023

mudler self-assigned this Nov 6, 2023

mudler added the roadmap label Nov 6, 2023

mudler closed this as completed Nov 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI Vision and TTS #1253

OpenAI Vision and TTS #1253

mudler commented Nov 6, 2023 •

edited

Loading

localai-bot commented Nov 7, 2023

mudler commented Nov 10, 2023

mudler commented Nov 10, 2023

mudler commented Nov 11, 2023

OpenAI Vision and TTS #1253

OpenAI Vision and TTS #1253

Comments

mudler commented Nov 6, 2023 • edited Loading

localai-bot commented Nov 7, 2023

⚠️⚠️⚠️⚠️⚠️

⚠️⚠️⚠️⚠️⚠️

mudler commented Nov 10, 2023

mudler commented Nov 10, 2023

mudler commented Nov 11, 2023

mudler commented Nov 6, 2023 •

edited

Loading