
# GnuspeechSA (Stand-Alone)

GnuspeechSA is a command-line articulatory synthesizer that converts text
to speech.

GnuspeechSA is a C++ port of the TTS_Server in the original [Gnuspeech][]
system developed for NeXTSTEP, provided by David R. Hill, Leonard Manzara,
Craig Schock and contributors.
The base was the code on Gnuspeech's Subversion repository, revision 672,
downloaded in 2014-08-02. The source code was obtained from the directories:

    nextstep/trunk/ObjectiveC/Monet.realtime
    nextstep/trunk/src/SpeechObject/postMonet/server.monet

This software is written in multi-platform C++.

[Gnuspeech]: https://www.gnu.org/software/gnuspeech/

## Gnuspeech

Gnuspeech is an articulatory speech synthesizer. The project implemented the
first articulatory text-to-speech (TTS) software (as far as I know).
It was developed in the 90s, around 30 years ago (in 2024).
The synthesizer was previously a closed source commercial software, available
only for NeXT computers. After the demise of NeXT, the software was donated to
the [GNU][] project. It has a simple vocal tract model, because the NeXT was a
very slow computer (the CPUs of the 90s operated at a clock frequency of tens
of MHz). The relative low complexity of the model allows low latency synthesis
on modern personal computers.

[GNU]: https://www.gnu.org

The original TTS system had two implementations of the vocal tract model
(tube model), one that executed on a 56k DSP, written in assembly, and another
that executed on the CPU, written in C. The DSP tube model generates better
speech, with more balanced fricatives/plosives. This repository uses the C
tube model.

## Synthesis examples

The sounds below were synthesized from the text of
[The Chaos (short version)](the_chaos.txt) by Gerard Nolst Trenité.

### Original code (for NeXT - not in this repository) using the DSP vocal tract model

- [English - Male](sound/trillium_tts-the_chaos.mp3)

### GnuspeechSA 0.1.8

- [English - Male       ](sound/gnuspeech_sa-0.1.8-english_male-the_chaos.mp3)
- [English - Female     ](sound/gnuspeech_sa-0.1.8-english_female-the_chaos.mp3)
- [English - Large child](sound/gnuspeech_sa-0.1.8-english_large_child-the_chaos.mp3)
- [English - Small child](sound/gnuspeech_sa-0.1.8-english_small_child-the_chaos.mp3)
- [English - Baby       ](sound/gnuspeech_sa-0.1.8-english_baby-the_chaos.mp3)

## Status

**maintenance**

Only english is supported.

## License

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
COPYING.txt file for more details.

## External code

This software includes code from [RapidXml][].
See the file src/rapidxml/license.txt for details.

[RapidXml]: https://rapidxml.sourceforge.net/

## Usage of `gnuspeech_sa`

`gnuspeech_sa` converts the input text to speech.

    ./gnuspeech_sa [-v] -c config_dir -p trm_param_file.txt -o output_file.wav \
            "Hello world."
        Synthesizes text from the command line.
        -v : verbose

        config_dir is the directory that stores the configuration data,
            e.g. data/en.
        trm_param_file.txt will be generated, containing the tube model
            parameters.
        output_file.wav will be generated, containing the synthesized speech.

    ./gnuspeech_sa [-v] -c config_dir -i input_text.txt -p trm_param_file.txt \
            -o output_file.wav
        Synthesizes text from a file.
        -v : verbose

        config_dir is the directory that stores the configuration data,
            e.g. data/en.
        input_text.txt contains the input text.
        trm_param_file.txt will be generated, containing the tube model
            parameters.
        output_file.wav will be generated, containing the synthesized speech.

## Usage of `gnuspeech_sa_trm`

`gnuspeech_sa_trm` executes only the tube model.

    ./gnuspeech_sa_trm [-v] trm_param_file.txt output_file.wav
        -v : verbose

        trm_param_file.txt is the file generated by gnuspeech_sa, containing the
            tube model parameters.
        output_file.wav will be generated, containing the synthesized speech.

## Contents of data/en

### `monet.xml`

Contains the articulatory database.

### `intonation.txt`

Controls the intonation.

If `random_intonation = 0` in `trm_control_model.txt`, only the first
line in each tone group will be used. If `random_intonation = 1`, the
line will be randomly selected.

### `MainDictionary.txt`

Contains the main dictionary, which relates words to postures.

### `trm.txt`

Contains the parameters for the tube model.

Interesting parameters are:

        vocal_tract_length_offset
            This value is added to the vocal tract length.
        loss_factor
            Defines the acoustic loss inside the vocal tract.

### `trm_control_model.txt`

Contains the parameters for the tube model controller.

Interesting parameters are:

        voice_name
            Defines the voice used in the synthesis.
            It selects which of the voice_*.txt files will be
            loaded.
        tempo
            Values greater than 1 will speed up the speech.
        pitch_offset
            Modifies the voice pitch.

        drift_deviation
        drift_lowpass_cutoff
            Control the random perturbations in the intonation
            (requires intonation_drift = 1).

        dictionary_1_file
        dictionary_2_file
        dictionary_3_file
            Indicate the dictionaries (the dictionaries will be
            searched in the order 1, 2, 3).

Note:

The following parameters are not being used at the moment:

- notional_pitch
- pretonic_range
- pretonic_lift
- tonic_range
- tonic_movement

### `voice_baby.txt`

### `voice_female.txt`

### `voice_large_child.txt`

### `voice_male.txt`

### `voice_small_child.txt`

Contain the voice parameters.

Interesting parameters are:

        vocal_tract_length

        glottal_pulse_tp
            Rise time, in % of the period.
        glottal_pulse_tn_min
            Fall time, in % of the period - for the highest pulse
            amplitude.
        glottal_pulse_tn_max
            Fall time, in % of the period - for the lowest pulse
            amplitude.

            These parameters modify the glottal pulse shape.

        reference_glottal_pitch
            Modify the voice pitch.

        breathiness

### `vowelTransitions.txt`

Controls vowel transitions.

### `vowelTransitions_2.txt`

Alternative version of `vowelTransitions.txt`.

It is not being used.
