No project description provided
Project description
Rhasspy ASR Kaldi
Automated speech recognition in Rhasspy voice assistant with Kaldi.
Requirements
- Python 3.7
- Kaldi
- Expects
$KALDI_DIR
in environment
- Expects
- Opengrm
- Expects
ngram*
in$PATH
- Expects
- Phonetisaurus
- Expects
phonetisaurus-apply
in$PATH
- Expects
See pre-built apps for pre-compiled binaries.
Installation
$ git clone https://github.com/rhasspy/rhasspy-asr-kaldi
$ cd rhasspy-asr-kaldi
$ ./configure
$ make
$ make install
Transcribing
Use python3 -m rhasspyasr_kaldi transcribe <ARGS>
usage: rhasspy-asr-kaldi transcribe [-h] --model-dir MODEL_DIR
[--graph-dir GRAPH_DIR]
[--model-type MODEL_TYPE]
[--frames-in-chunk FRAMES_IN_CHUNK]
[wav_file [wav_file ...]]
positional arguments:
wav_file WAV file(s) to transcribe
optional arguments:
-h, --help show this help message and exit
--model-dir MODEL_DIR
Path to Kaldi model directory (with conf, data)
--graph-dir GRAPH_DIR
Path to Kaldi graph directory (with HCLG.fst)
--model-type MODEL_TYPE
Either nnet3 or gmm (default: nnet3)
--frames-in-chunk FRAMES_IN_CHUNK
Number of frames to process at a time
For nnet3 models, the online2-tcp-nnet3-decode-faster
program is used to handle streaming audio. For gmm models, audio is buffered and packaged as a WAV file before being transcribed.
Training
Use python3 -m rhasspyasr_kaldi train <ARGS>
usage: rhasspy-asr-kaldi train [-h] --model-dir MODEL_DIR
[--graph-dir GRAPH_DIR]
[--intent-graph INTENT_GRAPH]
[--dictionary DICTIONARY]
[--dictionary-casing {upper,lower,ignore}]
[--language-model LANGUAGE_MODEL]
--base-dictionary BASE_DICTIONARY
[--g2p-model G2P_MODEL]
[--g2p-casing {upper,lower,ignore}]
optional arguments:
-h, --help show this help message and exit
--model-dir MODEL_DIR
Path to Kaldi model directory (with conf, data)
--graph-dir GRAPH_DIR
Path to Kaldi graph directory (with HCLG.fst)
--intent-graph INTENT_GRAPH
Path to intent graph JSON file (default: stdin)
--dictionary DICTIONARY
Path to write custom pronunciation dictionary
--dictionary-casing {upper,lower,ignore}
Case transformation for dictionary words (training,
default: ignore)
--language-model LANGUAGE_MODEL
Path to write custom language model
--base-dictionary BASE_DICTIONARY
Paths to pronunciation dictionaries
--g2p-model G2P_MODEL
Path to Phonetisaurus grapheme-to-phoneme FST model
--g2p-casing {upper,lower,ignore}
Case transformation for g2p words (training, default:
ignore)
This will generate a custom HCLG.fst
from an intent graph created using rhasspy-nlu. Your Kaldi model directory should be laid out like this:
- my_model/ (
--model-dir
)- conf/
- mfcc_hires.conf
- data/
- local/
- dict/
- lexicon.txt (copied from
--dictionary
)
- lexicon.txt (copied from
- lang/
- lm.arpa.gz (copied from
--language-model
)
- lm.arpa.gz (copied from
- dict/
- local/
- graph/ (
--graph-dir
)- HCLG.fst (generated)
- model/
- final.mdl
- phones/
- extra_questions.txt
- nonsilence_phones.txt
- optional_silence.txt
- silence_phones.txt
- online/ (nnet3 only)
- extractor/ (nnet3 only)
- conf/
When using the train
command, you will need to specify the following arguments:
--intent-graph
- path to graph json file generated using rhasspy-nlu--model-type
- either nnet3 or gmm--model-dir
- path to top-level model directory (my_model in example above)--graph-dir
- path to directory where HCLG.fst should be written (my_model/graph in example above)--base-dictionary
- pronunciation dictionary with all words from intent graph (can be used multiple times)--dictionary
- path to write custom pronunciation dictionary (optional)--language-model
- path to write custom ARPA language model (optional)
Building From Source
rhasspy-asr-kaldi
depends on the following programs that must be compiled:
- Kaldi
- Speech to text engine
- Opengrm
- Create ARPA language models
- Phonetisaurus
- Guesses pronunciations for unknown words
Kaldi
Make sure you have the necessary dependencies installed:
sudo apt-get install \
build-essential \
libatlas-base-dev libatlas3-base gfortran \
automake autoconf unzip sox libtool subversion \
python3 python \
git zlib1g-dev
Download Kaldi and extract it:
wget -O kaldi-master.tar.gz \
'https://github.com/kaldi-asr/kaldi/archive/master.tar.gz'
tar -xvf kaldi-master.tar.gz
First, build Kaldi's tools:
cd kaldi-master/tools
make
Use make -j 4
if you have multiple CPU cores. This will take a long time.
Next, build Kaldi itself:
cd kaldi-master
./configure --shared --mathlib=ATLAS
make depend
make
Use make depend -j 4
and make -j 4
if you have multiple CPU cores. This will take a long time.
There is no installation step. The kaldi-master
directory contains all the libraries and programs that Rhasspy will need to access.
See docker-kaldi for a Docker build script.
Phonetisaurus
Make sure you have the necessary dependencies installed:
sudo apt-get install build-essential
First, download and build OpenFST 1.6.2
wget http://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.6.2.tar.gz
tar -xvf openfst-1.6.2.tar.gz
cd openfst-1.6.2
./configure \
"--prefix=$(pwd)/build" \
--enable-static --enable-shared \
--enable-far --enable-ngram-fsts
make
make install
Use make -j 4
if you have multiple CPU cores. This will take a long time.
Next, download and extract Phonetisaurus:
wget -O phonetisaurus-master.tar.gz \
'https://github.com/AdolfVonKleist/Phonetisaurus/archive/master.tar.gz'
tar -xvf phonetisaurus-master.tar.gz
Finally, build Phonetisaurus (where /path/to/openfst
is the openfst-1.6.2
directory from above):
cd Phonetisaurus-master
./configure \
--with-openfst-includes=/path/to/openfst/build/include \
--with-openfst-libs=/path/to/openfst/build/lib
make
make install
Use make -j 4
if you have multiple CPU cores. This will take a long time.
You should now be able to run the phonetisaurus-align
program.
See docker-phonetisaurus for a Docker build script.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file rhasspy-asr-kaldi-0.6.1.tar.gz
.
File metadata
- Download URL: rhasspy-asr-kaldi-0.6.1.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c9eeae8f96d05da0093029dd093a8c1253fc5fd50e33b29da5a26cf5cd72ccb |
|
MD5 | e53cc5bbb01806dd9f62929b7d24d93b |
|
BLAKE2b-256 | bc9998ae0ff8b3127981da6ca6ac7bdcebd75141a3997708ed40547b54ccd870 |