Skip to main content

No project description provided

Project description

Rhasspy Raven Wakeword System

Wakeword detector based on the Snips Personal Wake Word Detector.

The underlying implementation of Raven heavily borrows from node-personal-wakeword by mathquis.

Dependencies

Installation

$ git clone https://github.com/rhasspy/rhasspy-wake-raven.git
$ cd rhasspy-wake-raven
$ ./configure
$ make
$ make install

Recording Templates

Record at least 3 WAV templates with your wake word:

$ arecord -r 16000 -f S16_LE -c 1 -t raw | \
    bin/rhasspy-wake-raven --record 'my-wake-word-{n:02d}.wav' my-wake-word/

Follow the prompts and speak your wake word. When you've recorded at least 3 examples, hit CTRL+C to exit. Your WAV templates will have silence automatically trimmed, and will be saved in the directory my-wake-word/.

If you want to manually record WAV templates, trim silence off the front and back and make sure to export them as 16-bit 16Khz mono WAV files.

Running

After recording your WAV templates in a directory, run:

$ arecord -r 16000 -f S16_LE -c 1 -t raw | \
    bin/rhasspy-wake-raven <WAV_DIR> ...

where <WAV_DIR> contains the WAV templates. You may also specify individual WAV files.

Add --debug to the command line to get more information about the underlying computation on each audio frame.

Example

Using the example files for "okay rhasspy":

$ arecord -r 16000 -f S16_LE -c 1 -t raw | \
    bin/rhasspy-wake-raven etc/okay-rhasspy/

This requires at least 1 of the 3 WAV templates to match before output like this is printed:

{"keyword": "etc/okay-rhasspy/okay-rhasspy-00.wav", "detect_seconds": 2.7488508224487305, "detect_timestamp": 1594996988.638912, "raven": {"probability": 0.45637207995699963, "distance": 0.25849045215799454, "probability_threshold": 0.5, "distance_threshold": 0.22, "tick": 1, "matches": 2}}

Use --minimum-matches to change how many templates must match for a detection to occur. Adjust the sensitivity with --probability-threshold which sets the lower bound of the detection probability (default is 0.5).

Output

Raven outputs a line of JSON when the wake word is detected. Fields are:

  • keyword - path to WAV file template
  • detect_seconds - seconds after start of program when detection occurred
  • detect_timestamp - timestamp when detection occurred (using time.time())
  • raven
    • probability - detection probability
    • probability_threshold - range of probabilities for detection
    • distance - normalized dynamic time warping distance
    • distance_threshold - distance threshold used for comparison
    • matches - number of WAV templates that matched
    • tick - monotonic counter incremented for each detection

Testing

You can test how well Raven works on a set of sample WAV files:

$ PATH=$PWD/bin:$PATH test-raven.py --test-directory /path/to/samples/ /path/to/templates/

This will run up to 10 parallel instances of Raven (change with --test-workers) and output a JSON report with detection information and summary statistics like:

{
  "positive": [...],
  "negative": [...],
  "summary": {
    "true_positives": 14,
    "false_positives": 0,
    "true_negatives": 40,
    "false_negatives": 7,
    "precision": 1.0,
    "recall": 0.6666666666666666,
    "f1_score": 0.8
}

Any additional command-line arguments are passed to Raven (e.g., --minimum-matches).

Command-Line Interface

usage: rhasspy-wake-raven [-h]
                          [--probability-threshold PROBABILITY_THRESHOLD PROBABILITY_THRESHOLD]
                          [--distance-threshold DISTANCE_THRESHOLD]
                          [--minimum-matches MINIMUM_MATCHES]
                          [--refractory-seconds REFRACTORY_SECONDS]
                          [--print-all-matches]
                          [--window-shift-seconds WINDOW_SHIFT_SECONDS]
                          [--dtw-window-size DTW_WINDOW_SIZE]
                          [--vad-sensitivity {1,2,3}]
                          [--current-threshold CURRENT_THRESHOLD]
                          [--max-energy MAX_ENERGY]
                          [--max-current-ratio-threshold MAX_CURRENT_RATIO_THRESHOLD]
                          [--silence-method {vad_only,ratio_only,current_only,vad_and_ratio,vad_and_current,all}]
                          [--average-templates] [--debug]
                          templates [templates ...]

positional arguments:
  templates             Path to WAV file templates

optional arguments:
  -h, --help            show this help message and exit
  --probability-threshold PROBABILITY_THRESHOLD PROBABILITY_THRESHOLD
                        Probability range where detection occurs (default:
                        (0.45, 0.55))
  --distance-threshold DISTANCE_THRESHOLD
                        Normalized dynamic time warping distance threshold for
                        template matching (default: 0.22)
  --minimum-matches MINIMUM_MATCHES
                        Number of templates that must match to produce output
                        (default: 1)
  --refractory-seconds REFRACTORY_SECONDS
                        Seconds before wake word can be activated again
                        (default: 2)
  --print-all-matches   Print JSON for all matching templates instead of just
                        the first one
  --window-shift-seconds WINDOW_SHIFT_SECONDS
                        Seconds to shift sliding time window on audio buffer
                        (default: 0.05)
  --dtw-window-size DTW_WINDOW_SIZE
                        Size of band around slanted diagonal during dynamic
                        time warping calculation (default: 5)
  --vad-sensitivity {1,2,3}
                        Webrtcvad VAD sensitivity (1-3)
  --current-threshold CURRENT_THRESHOLD
                        Debiased energy threshold of current audio frame
  --max-energy MAX_ENERGY
                        Fixed maximum energy for ratio calculation (default:
                        observed)
  --max-current-ratio-threshold MAX_CURRENT_RATIO_THRESHOLD
                        Threshold of ratio between max energy and current
                        audio frame
  --silence-method {vad_only,ratio_only,current_only,vad_and_ratio,vad_and_current,all}
                        Method for detecting silence
  --average-templates   Average wakeword templates together to reduce number
                        of calculations
  --debug               Print DEBUG messages to the console

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rhasspy-wake-raven-0.4.0.tar.gz (12.6 kB view details)

Uploaded Source

File details

Details for the file rhasspy-wake-raven-0.4.0.tar.gz.

File metadata

  • Download URL: rhasspy-wake-raven-0.4.0.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for rhasspy-wake-raven-0.4.0.tar.gz
Algorithm Hash digest
SHA256 fed51bb0ef49d99f098abb6724152875da6d0e4445e1c0b396085197424c4f15
MD5 24d678d03e6e0a4a383b161b2e0f2805
BLAKE2b-256 51ab97064cd09e8d94e3ee7ba192757e4983126978334fb96ffa54f792e5a842

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page