Skip to content

Commit

Permalink
20230418 add readme
Browse files Browse the repository at this point in the history
  • Loading branch information
der3318 committed Apr 17, 2023
1 parent 15f377b commit 76834c6
Show file tree
Hide file tree
Showing 4 changed files with 35 additions and 0 deletions.
Binary file added Images/Demo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Images/InteractiveMode.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Images/TranslationMode.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@

## 💬 Audio Powered GPT

![version](https://img.shields.io/badge/version-2.0.0-blue.svg)
![dotnetf](https://img.shields.io/badge/.net-6.0-green.svg)
[![openai](https://img.shields.io/badge/Azure.AI.OpenAI%20%28nuget%29-1.0.0%20beta.5-yellow.svg)](https://www.nuget.org/packages/Azure.AI.OpenAI)
[![speech](https://img.shields.io/badge/Microsoft.CognitiveServices.Speech%20%28nuget%29-1.27.0-pink.svg)](https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech)
![portable](https://img.shields.io/badge/portable-windows%20x64%20%2819041+%29-blueviolet.svg)

A tiny WPF interface that integrates [Azure cognitive service](https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices) with [GPT endpoint](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/create-resource). This requires Azure subscription resources of both speech service and OpenAI.

![Demo.png](https://github.com/der3318/audio-powered-gpt/blob/main/Images/Demo.png)


### Interactive Mode

Simply type or speak (via microphone) to ask GTP questions in this mode. Press the "start button" to trigger a speech QA session, and click the "start/stop button" again to pause.

![InteractiveMode.png](https://github.com/der3318/audio-powered-gpt/blob/main/Images/InteractiveMode.png)


### Translation Mode

This is the real time translation (into Chinese) functionality. Result texts will also be displayed as a 3-second toast in the bottom corner, so the app can be run completely in the background.

![TranslationMode.png](https://github.com/der3318/audio-powered-gpt/blob/main/Images/TranslationMode.png)

An audio redirection (from speacker to input) interface is a prerequisite to use the feature. Windows stereo mix or [VB-Cable](https://vb-audio.com/Cable/) is probably a good choice.


### References

* Icon: https://arstechnica.com/information-technology/2023/01/openai-and-microsoft-reaffirm-shared-quest-for-powerful-ai-with-new-investment/
* Azure Speech to Text: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-recognize-speech
* Azure OpenAI Studio: https://learn.microsoft.com/en-us/azure/cognitive-services/openai/quickstart

0 comments on commit 76834c6

Please # to comment.