GitHub - FifthRooter/koe: Voice-first messenger app, powered by Matrix

Proof-of-concept voice-first messenger app

Instead of being second rate citizens in most popular instant messaging apps, voice notes get equal footing in koe by integrating them more seamlessly in the message flow.

Voice notes as they come communicate no context by themselves, aside from the length of the message. That interrupts the flow of communication, as you are forced to switch between reading and listening, with no option to switch to only one or the other. In the case of koe, we give voice notes a room to breathe and allow users to read the voice notes.

Furthermore, using sentiment analysis, the message bubbles are colorized to communicate the emotion of the message. Much more thought needs to be put into how to better visualize the essence, emotion, and context of the voice notes, but this is a proof of concept start.

Tech stack

SvelteKit
Matrix protocol - matrix-js-sdk
OpenAI gpt-4-1106-preview API (summarization)
OpenAI's Whisper API (speech-to-text)
Sam Lowe's roberta-base-go_emotions (sentiment analysis) model
MediaStream Recording API (voice recording)
wavesurfer.xyz (audio visualization library)
PostgreSQL (db)

Key features I'm working on:

Voice notes get summarized using speech-to-text, then run through gpt4-turbo, and summary is shown right below the voice note within the same text bubble;
To show the full transcript, click on the summarized text, which will enlarge the text bubble;
Improved playback of voice messages: instead of just blindly sliding the duration dial on the note, the slider will automatically skip only to the beginnings of sentences;
Open message threads in a separate tab/window, similar to Discord;
Down the road it would be cool to also add a text-to-speech option, so that all text messages get converted to speech, with the generated voice being that of the message sender;
- this can already be done with a very small amount of sample data (like elevenlabs), but the privacy concerns are too involved to exercise this idea so quick. but would be cool to figure out a way that, as long as all parties involved in the chat consent to it, their voices are used to train and later use for their text messages.

I'm using Matrix protocol due to it being an off-the-shelf, reliable, pretty mature E2EE messaging protocol that I can build a frontend around, instead of building my own.

This is a PoC I'm working on to satisfy my curiosity about whether voice notes can be better integrated into the messenger app communications flow, and is not intended to be a final product of any kind.

Current game plan/progress:

~~Set up a basic local Matrix environment (using Synapse homeserver)~~
~~Integrate Matrix client SDK w/ user authentication and session persistence~~
~~Create minimal chat interface~~
~~Implement voice recording functionality~~
Integrate wavesurfer waveform visualization for voice recordings
Integrate speech-to-text API for transcription
Implement sentiment analysis
Develop color-coded emotion visualization
Automatically cut out long pauses between words (uhms, ahms, thinking breaks), make it optional for user in the settings
Create interactive playback dashboard
Implement threaded conversations feature
Prototype testing
Prepare a demo setup
Let people shit on the idea
Try to refine the prototype based on feedback
Slowly lose hope
Hope lost, queue despair
Move on to the next idea

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
nodejs		nodejs
python		python
src		src
static		static
.eslintignore		.eslintignore
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
README.md		README.md
homeserver.yaml.example		homeserver.yaml.example
koe-matrix.log.config.example		koe-matrix.log.config.example
package-lock.json		package-lock.json
package.json		package.json
svelte.config.js		svelte.config.js
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proof-of-concept voice-first messenger app

Tech stack

Key features I'm working on:

Current game plan/progress:

About

Releases

Packages

Languages

FifthRooter/koe

Folders and files

Latest commit

History

Repository files navigation

Proof-of-concept voice-first messenger app

Tech stack

Key features I'm working on:

Current game plan/progress:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages