No project description provided
Project description
Rhasspy Raven Wakeword System
Wakeword detector based on the Snips Personal Wake Word Detector.
The underlying implementation of Raven heavily borrows from node-personal-wakeword by mathquis.
Dependencies
- Python 3.7
python-speech-features
for MFCC computationrhasspy-silence
for silence detection- Scientific libraries
sudo apt-get install liblapack3 libatlas-base-dev
Installation
$ git clone https://github.com/rhasspy/rhasspy-wake-raven.git
$ cd rhasspy-wake-raven
$ ./configure
$ make
$ make install
Recording Templates
Record at least 3 WAV templates with your wake word:
$ arecord -r 16000 -f S16_LE -c 1 -t raw | \
bin/rhasspy-wake-raven --record keyword-dir/
Follow the prompts and speak your wake word. When you've recorded at least 3 examples, hit CTRL+C to exit. Your WAV templates will have silence automatically trimmed, and will be saved in the directory keyword-dir/
. Add a format string after the directory name to control the file names:
$ arecord -r 16000 -f S16_LE -c 1 -t raw | \
bin/rhasspy-wake-raven --record keyword-dir/ 'keyword-{n:02d}.wav'
The format string will receive the 0-based index n
for each example.
If you want to manually record WAV templates, trim silence off the front and back and make sure to export them as 16-bit 16Khz mono WAV files.
Running
After recording your WAV templates in a directory, run:
$ arecord -r 16000 -f S16_LE -c 1 -t raw | \
bin/rhasspy-wake-raven --keyword <WAV_DIR> ...
where <WAV_DIR>
contains the WAV templates. You may add as many keywords as you'd like, though this will use additional CPU. It's recommended you use --average-templates
to keep CPU usage down.
Some settings can be specified per-keyword:
$ arecord -r 16000 -f S16_LE -c 1 -t raw | \
bin/rhasspy-wake-raven \
--keyword keyword1/ name=my-keyword1 probability-threshold=0.45 minimum-matches=2 \
--keyword keyword2/ name=my-keyword2 probability-threshold=0.55 average-templates=true
If not set, probability-threshold=
, etc. fall back on the values supplied to --probability-threshold
, etc.
Add --debug
to the command line to get more information about the underlying computation on each audio frame.
Example
Using the example files for "okay rhasspy":
$ arecord -r 16000 -f S16_LE -c 1 -t raw | \
bin/rhasspy-wake-raven --keyword etc/okay-rhasspy/
This requires at least 1 of the 3 WAV templates to match before output like this is printed:
{"keyword": "okay-rhasspy", "template": "etc/okay-rhasspy/okay-rhasspy-00.wav", "detect_seconds": 2.7488508224487305, "detect_timestamp": 1594996988.638912, "raven": {"probability": 0.45637207995699963, "distance": 0.25849045215799454, "probability_threshold": 0.5, "distance_threshold": 0.22, "tick": 1, "matches": 2, "match_seconds": 0.005367016012314707}}
Use --minimum-matches
to change how many templates must match for a detection to occur or --average-templates
to combine all WAV templates into a single template (reduces CPU usage). Adjust the sensitivity with --probability-threshold
which sets the lower bound of the detection probability (default is 0.5).
Output
Raven outputs a line of JSON when the wake word is detected. Fields are:
keyword
- name of keyword or directorytemplate
- path to WAV file templatedetect_seconds
- seconds after start of program when detection occurreddetect_timestamp
- timestamp when detection occurred (usingtime.time()
)raven
probability
- detection probabilityprobability_threshold
- range of probabilities for detectiondistance
- normalized dynamic time warping distancedistance_threshold
- distance threshold used for comparisonmatches
- number of WAV templates that matchedmatch_seconds
- seconds taken for dynamic time warping calculationstick
- monotonic counter incremented for each detection
Testing
You can test how well Raven works on a set of sample WAV files:
$ PATH=$PWD/bin:$PATH test-raven.py --test-directory /path/to/samples/ --keyword /path/to/templates/
This will run up to 10 parallel instances of Raven (change with --test-workers
) and output a JSON report with detection information and summary statistics like:
{
"positive": [...],
"negative": [...],
"summary": {
"true_positives": 14,
"false_positives": 0,
"true_negatives": 40,
"false_negatives": 7,
"precision": 1.0,
"recall": 0.6666666666666666,
"f1_score": 0.8
}
Any additional command-line arguments are passed to Raven (e.g., --minimum-matches
).
Command-Line Interface
usage: rhasspy-wake-raven [-h] [--keyword KEYWORD [KEYWORD ...]]
[--chunk-size CHUNK_SIZE]
[--record RECORD [RECORD ...]]
[--probability-threshold PROBABILITY_THRESHOLD]
[--distance-threshold DISTANCE_THRESHOLD]
[--minimum-matches MINIMUM_MATCHES]
[--refractory-seconds REFRACTORY_SECONDS]
[--print-all-matches]
[--window-shift-seconds WINDOW_SHIFT_SECONDS]
[--dtw-window-size DTW_WINDOW_SIZE]
[--vad-sensitivity {1,2,3}]
[--current-threshold CURRENT_THRESHOLD]
[--max-energy MAX_ENERGY]
[--max-current-ratio-threshold MAX_CURRENT_RATIO_THRESHOLD]
[--silence-method {vad_only,ratio_only,current_only,vad_and_ratio,vad_and_current,all}]
[--average-templates] [--exit-count EXIT_COUNT]
[--read-entire-input]
[--max-chunks-in-queue MAX_CHUNKS_IN_QUEUE]
[--skip-probability-threshold SKIP_PROBABILITY_THRESHOLD]
[--failed-matches-to-refractory FAILED_MATCHES_TO_REFRACTORY]
[--debug]
optional arguments:
-h, --help show this help message and exit
--keyword KEYWORD [KEYWORD ...]
Directory with WAV templates and settings (setting-
name=value)
--chunk-size CHUNK_SIZE
Number of bytes to read at a time from standard in
(default: 1920)
--record RECORD [RECORD ...]
Record example templates to a directory, optionally
with given name format (e.g., 'my-
keyword-{n:02d}.wav')
--probability-threshold PROBABILITY_THRESHOLD
Probability above which detection occurs (default:
0.5)
--distance-threshold DISTANCE_THRESHOLD
Normalized dynamic time warping distance threshold for
template matching (default: 0.22)
--minimum-matches MINIMUM_MATCHES
Number of templates that must match to produce output
(default: 1)
--refractory-seconds REFRACTORY_SECONDS
Seconds before wake word can be activated again
(default: 2)
--print-all-matches Print JSON for all matching templates instead of just
the first one
--window-shift-seconds WINDOW_SHIFT_SECONDS
Seconds to shift sliding time window on audio buffer
(default: 0.02)
--dtw-window-size DTW_WINDOW_SIZE
Size of band around slanted diagonal during dynamic
time warping calculation (default: 5)
--vad-sensitivity {1,2,3}
Webrtcvad VAD sensitivity (1-3)
--current-threshold CURRENT_THRESHOLD
Debiased energy threshold of current audio frame
--max-energy MAX_ENERGY
Fixed maximum energy for ratio calculation (default:
observed)
--max-current-ratio-threshold MAX_CURRENT_RATIO_THRESHOLD
Threshold of ratio between max energy and current
audio frame
--silence-method {vad_only,ratio_only,current_only,vad_and_ratio,vad_and_current,all}
Method for detecting silence
--average-templates Average wakeword templates together to reduce number
of calculations
--exit-count EXIT_COUNT
Exit after some number of detections (default: never)
--read-entire-input Read entire audio input at start and exit after
processing
--max-chunks-in-queue MAX_CHUNKS_IN_QUEUE
Maximum number of audio chunks waiting for processing
before being dropped
--skip-probability-threshold SKIP_PROBABILITY_THRESHOLD
Skip additional template calculations if probability
is below this threshold
--failed-matches-to-refractory FAILED_MATCHES_TO_REFRACTORY
Number of failed template matches before entering
refractory period (default: disabled)
--debug Print DEBUG messages to the console
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file rhasspy-wake-raven-0.5.0.tar.gz
.
File metadata
- Download URL: rhasspy-wake-raven-0.5.0.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b492f047cf2f5e42e8ff1acc233451624b3921d8c6684edd40d94970573721aa |
|
MD5 | 10614ffb8077059215c41ef823d01286 |
|
BLAKE2b-256 | 09b5ca74d263b122811ced71a7b00a522b9c012ecec65fd10cfe59db23a90e80 |