Skip to content

OpenAI Vision and TTS #1253

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
4 tasks done
mudler opened this issue Nov 6, 2023 · 4 comments
Closed
4 tasks done

OpenAI Vision and TTS #1253

mudler opened this issue Nov 6, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request roadmap

Comments

@mudler
Copy link
Owner

mudler commented Nov 6, 2023

https://openai.com/blog/new-models-and-developer-products-announced-at-devday

OpenAI recently announced Turbo vision and text to speech API. This card is to create an API equivalent for the new functionalities

For Turbo vision we need first #1054

@mudler mudler added the enhancement New feature or request label Nov 6, 2023
@mudler mudler self-assigned this Nov 6, 2023
@mudler mudler added the roadmap label Nov 6, 2023
@localai-bot
Copy link
Contributor

⚠️⚠️⚠️⚠️⚠️

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful 😸 and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me,
I don't support (yet) replying!

⚠️⚠️⚠️⚠️⚠️

LocalAI already supports text generation with GPTs using llama.cpp and other backends (such as rwkv.cpp). For the new GPT-4 Turbo Vision API, we can add a new endpoint to LocalAI that uses the LLaVA library. We will need to install the LLaVA library and set up the backend in the codebase.
For the text-to-speech API, we can use the OpenAI functions provided by LocalAI to generate audio from text. We can add a new endpoint to LocalAI that takes text input and returns an audio file of the generated speech. We will need to set up the backend in the codebase and train the model on a dataset of text-to-speech pairs.
To get started, we can create a feature request for each of these features on the LocalAI GitHub issue tracker and assign them to the relevant team members.

Sources:

@mudler
Copy link
Owner Author

mudler commented Nov 10, 2023

GPT Vision support API is introduced as part of #1254, along with JSON mode (https://platform.openai.com/docs/guides/text-generation/json-mode)

@mudler
Copy link
Owner Author

mudler commented Nov 10, 2023

On TTS: By luck I think we are already compatible, but I've didn't tried myself the client

@mudler
Copy link
Owner Author

mudler commented Nov 11, 2023

Closing as the big portion of this is done, created follow-ups for what's left

@mudler mudler closed this as completed Nov 11, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request roadmap
Projects
None yet
Development

No branches or pull requests

2 participants