GnuspeechSA is a command-line articulatory synthesizer that converts text to speech.
GnuspeechSA is a C++ port of the TTS_Server in the original Gnuspeech system developed for NeXTSTEP, provided by David R. Hill, Leonard Manzara, Craig Schock and contributors. The base was the code on Gnuspeech's Subversion repository, revision 672, downloaded in 2014-08-02. The source code was obtained from the directories:
nextstep/trunk/ObjectiveC/Monet.realtime
nextstep/trunk/src/SpeechObject/postMonet/server.monet
This software is written in multi-platform C++.
Gnuspeech is an articulatory speech synthesizer. The project implemented the first articulatory text-to-speech (TTS) software (as far as I know). It was developed in the 90s, around 30 years ago (in 2024). The synthesizer was previously a closed source commercial software, available only for NeXT computers. After the demise of NeXT, the software was donated to the GNU project. It has a simple vocal tract model, because the NeXT was a very slow computer (the CPUs of the 90s operated at a clock frequency of tens of MHz). The relative low complexity of the model allows low latency synthesis on modern personal computers.
The original TTS system had two implementations of the vocal tract model (tube model), one that executed on a 56k DSP, written in assembly, and another that executed on the CPU, written in C. The DSP tube model generates better speech, with more balanced fricatives/plosives. This repository uses the C tube model.
The sounds below were synthesized from the text of The Chaos (short version) by Gerard Nolst Trenité.
maintenance
Only english is supported.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the COPYING.txt file for more details.
This software includes code from RapidXml. See the file src/rapidxml/license.txt for details.
gnuspeech_sa
converts the input text to speech.
./gnuspeech_sa [-v] -c config_dir -p trm_param_file.txt -o output_file.wav \
"Hello world."
Synthesizes text from the command line.
-v : verbose
config_dir is the directory that stores the configuration data,
e.g. data/en.
trm_param_file.txt will be generated, containing the tube model
parameters.
output_file.wav will be generated, containing the synthesized speech.
./gnuspeech_sa [-v] -c config_dir -i input_text.txt -p trm_param_file.txt \
-o output_file.wav
Synthesizes text from a file.
-v : verbose
config_dir is the directory that stores the configuration data,
e.g. data/en.
input_text.txt contains the input text.
trm_param_file.txt will be generated, containing the tube model
parameters.
output_file.wav will be generated, containing the synthesized speech.
gnuspeech_sa_trm
executes only the tube model.
./gnuspeech_sa_trm [-v] trm_param_file.txt output_file.wav
-v : verbose
trm_param_file.txt is the file generated by gnuspeech_sa, containing the
tube model parameters.
output_file.wav will be generated, containing the synthesized speech.
Contains the articulatory database.
Controls the intonation.
If random_intonation = 0
in trm_control_model.txt
, only the first
line in each tone group will be used. If random_intonation = 1
, the
line will be randomly selected.
Contains the main dictionary, which relates words to postures.
Contains the parameters for the tube model.
Interesting parameters are:
vocal_tract_length_offset
This value is added to the vocal tract length.
loss_factor
Defines the acoustic loss inside the vocal tract.
Contains the parameters for the tube model controller.
Interesting parameters are:
voice_name
Defines the voice used in the synthesis.
It selects which of the voice_*.txt files will be
loaded.
tempo
Values greater than 1 will speed up the speech.
pitch_offset
Modifies the voice pitch.
drift_deviation
drift_lowpass_cutoff
Control the random perturbations in the intonation
(requires intonation_drift = 1).
dictionary_1_file
dictionary_2_file
dictionary_3_file
Indicate the dictionaries (the dictionaries will be
searched in the order 1, 2, 3).
Note:
The following parameters are not being used at the moment:
- notional_pitch
- pretonic_range
- pretonic_lift
- tonic_range
- tonic_movement
Contain the voice parameters.
Interesting parameters are:
vocal_tract_length
glottal_pulse_tp
Rise time, in % of the period.
glottal_pulse_tn_min
Fall time, in % of the period - for the highest pulse
amplitude.
glottal_pulse_tn_max
Fall time, in % of the period - for the lowest pulse
amplitude.
These parameters modify the glottal pulse shape.
reference_glottal_pitch
Modify the voice pitch.
breathiness
Controls vowel transitions.
Alternative version of vowelTransitions.txt
.
It is not being used.