Skip to content

mym-br/gnuspeech_sa

Repository files navigation

GnuspeechSA (Stand-Alone)

GnuspeechSA is a command-line articulatory synthesizer that converts text to speech.

GnuspeechSA is a C++ port of the TTS_Server in the original Gnuspeech system developed for NeXTSTEP, provided by David R. Hill, Leonard Manzara, Craig Schock and contributors. The base was the code on Gnuspeech's Subversion repository, revision 672, downloaded in 2014-08-02. The source code was obtained from the directories:

nextstep/trunk/ObjectiveC/Monet.realtime
nextstep/trunk/src/SpeechObject/postMonet/server.monet

This software is written in multi-platform C++.

Gnuspeech

Gnuspeech is an articulatory speech synthesizer. The project implemented the first articulatory text-to-speech (TTS) software (as far as I know). It was developed in the 90s, around 30 years ago (in 2024). The synthesizer was previously a closed source commercial software, available only for NeXT computers. After the demise of NeXT, the software was donated to the GNU project. It has a simple vocal tract model, because the NeXT was a very slow computer (the CPUs of the 90s operated at a clock frequency of tens of MHz). The relative low complexity of the model allows low latency synthesis on modern personal computers.

The original TTS system had two implementations of the vocal tract model (tube model), one that executed on a 56k DSP, written in assembly, and another that executed on the CPU, written in C. The DSP tube model generates better speech, with more balanced fricatives/plosives. This repository uses the C tube model.

Synthesis examples

The sounds below were synthesized from the text of The Chaos (short version) by Gerard Nolst Trenité.

Original code (for NeXT - not in this repository) using the DSP vocal tract model

GnuspeechSA 0.1.8

Status

maintenance

Only english is supported.

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the COPYING.txt file for more details.

External code

This software includes code from RapidXml. See the file src/rapidxml/license.txt for details.

Usage of gnuspeech_sa

gnuspeech_sa converts the input text to speech.

./gnuspeech_sa [-v] -c config_dir -p trm_param_file.txt -o output_file.wav \
        "Hello world."
    Synthesizes text from the command line.
    -v : verbose

    config_dir is the directory that stores the configuration data,
        e.g. data/en.
    trm_param_file.txt will be generated, containing the tube model
        parameters.
    output_file.wav will be generated, containing the synthesized speech.

./gnuspeech_sa [-v] -c config_dir -i input_text.txt -p trm_param_file.txt \
        -o output_file.wav
    Synthesizes text from a file.
    -v : verbose

    config_dir is the directory that stores the configuration data,
        e.g. data/en.
    input_text.txt contains the input text.
    trm_param_file.txt will be generated, containing the tube model
        parameters.
    output_file.wav will be generated, containing the synthesized speech.

Usage of gnuspeech_sa_trm

gnuspeech_sa_trm executes only the tube model.

./gnuspeech_sa_trm [-v] trm_param_file.txt output_file.wav
    -v : verbose

    trm_param_file.txt is the file generated by gnuspeech_sa, containing the
        tube model parameters.
    output_file.wav will be generated, containing the synthesized speech.

Contents of data/en

monet.xml

Contains the articulatory database.

intonation.txt

Controls the intonation.

If random_intonation = 0 in trm_control_model.txt, only the first line in each tone group will be used. If random_intonation = 1, the line will be randomly selected.

MainDictionary.txt

Contains the main dictionary, which relates words to postures.

trm.txt

Contains the parameters for the tube model.

Interesting parameters are:

    vocal_tract_length_offset
        This value is added to the vocal tract length.
    loss_factor
        Defines the acoustic loss inside the vocal tract.

trm_control_model.txt

Contains the parameters for the tube model controller.

Interesting parameters are:

    voice_name
        Defines the voice used in the synthesis.
        It selects which of the voice_*.txt files will be
        loaded.
    tempo
        Values greater than 1 will speed up the speech.
    pitch_offset
        Modifies the voice pitch.

    drift_deviation
    drift_lowpass_cutoff
        Control the random perturbations in the intonation
        (requires intonation_drift = 1).

    dictionary_1_file
    dictionary_2_file
    dictionary_3_file
        Indicate the dictionaries (the dictionaries will be
        searched in the order 1, 2, 3).

Note:

The following parameters are not being used at the moment:

  • notional_pitch
  • pretonic_range
  • pretonic_lift
  • tonic_range
  • tonic_movement

voice_baby.txt

voice_female.txt

voice_large_child.txt

voice_male.txt

voice_small_child.txt

Contain the voice parameters.

Interesting parameters are:

    vocal_tract_length

    glottal_pulse_tp
        Rise time, in % of the period.
    glottal_pulse_tn_min
        Fall time, in % of the period - for the highest pulse
        amplitude.
    glottal_pulse_tn_max
        Fall time, in % of the period - for the lowest pulse
        amplitude.

        These parameters modify the glottal pulse shape.

    reference_glottal_pitch
        Modify the voice pitch.

    breathiness

vowelTransitions.txt

Controls vowel transitions.

vowelTransitions_2.txt

Alternative version of vowelTransitions.txt.

It is not being used.