Phoneme Driven Speech to Text Model

Overview

This project focuses on building a phoneme-driven Text-To-Speech model. The model is designed to convert word pronunciations to phonetic representations, a crucial step in Speech To Text synthesis. The process involves converting lexical orthographic symbols (words) to phonetic sequences.

Project Structure

The project is organized into three main components:

Word Pronunciations: Word to Phoneme Conversion

Utilizes the CMU Pronouncing Dictionary for word-to-phoneme conversion. Tokenization of words and phonemes for training.

Speech to Phoneme Model

Uses the Mozilla Common Voice dataset for audio data. Generates phoneme sequences for each sentence using the previously trained word-to-phoneme model. Utilizes Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks.

Phoneme to Word Model

Converts phoneme sequences back to word pronunciations. Employs LSTM networks for this task.

Note:

This project is a demonstration and may require additional tuning for optimal performance. Feel free to experiment and adapt the models based on your specific requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
CaptionFlow.ipynb		CaptionFlow.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phoneme Driven Speech to Text Model

Overview

Project Structure

Word Pronunciations: Word to Phoneme Conversion

Speech to Phoneme Model

Phoneme to Word Model

Note:

About

Releases

Packages

Languages

SiddhantOjha17/CaptionFlow

Folders and files

Latest commit

History

Repository files navigation

Phoneme Driven Speech to Text Model

Overview

Project Structure

Word Pronunciations: Word to Phoneme Conversion

Speech to Phoneme Model

Phoneme to Word Model

Note:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages