Skip to main content

End-to-end text to speech using IPA and onnx models

Project description

Larynx

End-to-end text to speech system using gruut and onnx.

Larynx logo

Larynx's goals are:

  • "Good enough" synthesis to avoid using a cloud service
  • Faster than realtime performance on a Raspberry Pi 4
  • Broad language support
  • Voices trained purely from public datasets

Samples

Listen to voice samples from all of the pre-trained models.

Installation

$ pip install larynx

For Raspberry Pi (ARM), you will first need to manually install phonetisaurus.

Language Download

Larynx uses gruut to transform text into phonemes. You must install the appropriate gruut language before using Larynx. U.S. English is included with gruut, but for other languages:

$ python3 -m gruut <LANGUAGE> download

Voice/Vocoder Download

Voices and vocoders are available to download from the release page. They can be extracted anywhere, and the directory simply needs to be referenced in the command-line (e,g, --glow-tts /path/to/voice).

Example

The command below synthesizes multiple sentences and saves them to a directory. The --csv command-line flag indicates that each sentence is of the form id|text where id will be the name of the WAV file.

$ cat << EOF |
s01|The birch canoe slid on the smooth planks.
s02|Glue the sheet to the dark blue background.
s03|It's easy to tell the depth of a well.
s04|These days a chicken leg is a rare dish.
s05|Rice is often served in round bowls.
s06|The juice of lemons makes fine punch.
s07|The box was thrown beside the parked truck.
s08|The hogs were fed chopped corn and garbage.
s09|Four hours of steady work faced us.
s10|Large size in stockings is hard to sell.
EOF
  larynx \
    --debug \
    --csv \
    --glow-tts local/en-us/harvard-glow_tts \
    --hifi-gan local/hifi_gan/universal_large \
    --output-dir wavs \
    --language en-us \
    --denoiser-strength 0.001

You can use the --interactive flag instead of --output-dir to type sentences and have the audio played immediately using sox.

GlowTTS Settings

The GlowTTS voices support two additional parameters:

  • --noise-scale - determines the speaker volatility during synthesis (0-1, default is 0.333)
  • --length-scale - makes the voice speaker slower (< 1) or faster (> 1)

Vocoder Settings

  • --denoiser-strength - runs the denoiser if > 0; a small value like 0.005 is recommended.

Text to Speech Models

  • GlowTTS (35 voices)
    • English (en-us, 20 voices)
      • blizzard_fls (F, accent)
      • cmu_aew (M)
      • cmu_ahw (M)
      • cmu_aup (M, accent)
      • cmu_bdl (M)
      • cmu_clb (F)
      • cmu_eey (F)
      • cmu_fem (M)
      • cmu_jmk (M)
      • cmu_ksp (M, accent)
      • cmu_ljm (F)
      • cmu_lnh (F)
      • cmu_rms (M)
      • cmu_rxr (M)
      • cmu_slp (F, accent)
      • cmu_slt (F)
      • ek (F, accent)
      • harvard (F, accent)
      • kathleen (F)
      • ljspeech (F)
    • German (de-de, 1 voice)
      • thorsten (M)
    • French (fr-fr, 3 voices)
      • gilles_le_blanc (M)
      • siwis (F)
      • tom (M)
    • Spanish (es-es, 2 voices)
      • carlfm (M)
      • karen_savage (F)
    • Dutch (nl, 3 voices)
      • bart_de_leeuw (M)
      • flemishguy (M)
      • rdh (M)
    • Italian (it-it, 2 voices)
      • lisa (F)
      • riccardo_fasol (M)
    • Swedish (sv-se, 1 voice)
      • talesyntese (M)
    • Russian (ru-ru, 3 voices)
      • hajdurova (F)
      • nikolaev (M)
      • minaev (M)
  • Tacotron2
    • Coming soon

Vocoders

  • Hi-Fi GAN
    • Universal large
    • VCTK medium
    • VCTK small
  • WaveGlow
    • 256 channel trained on LJ Speech

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

larynx-0.3.0.tar.gz (25.4 kB view details)

Uploaded Source

File details

Details for the file larynx-0.3.0.tar.gz.

File metadata

  • Download URL: larynx-0.3.0.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.9

File hashes

Hashes for larynx-0.3.0.tar.gz
Algorithm Hash digest
SHA256 bd78ba530d7c882d11e555b697b0ec8760ff38e0a6499e378b42055f6654175f
MD5 66410411c6b9d8ce149335bec2573988
BLAKE2b-256 b7dc68b1a94a5e6fc965e023f4f3a410bdba9fe5ecad7d3a3e751dad066843de

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page