Skip to main content

MOS (Mean Opinion Score) models for evaluating audio quality.

Project description

AECMOS, DNSMOS, PLCMOS

  • We release the AECMOS, DNSMOS, and PLCMOS models that we have developed for evaluating audio degradations due to echo, noise, packet loss and other sources.

Prerequisites

  • Python 3.7 and above
  • librosa 0.9.1
  • numpy 1.21.5
  • onnxruntime 1.10.0
  • pandas
  • tqdm

Usage:

from speechmos import aecmos, dnsmos, plcmos

aecmos.run(sample, sr, talk_type, **kwargs)

dnsmos.run(sample, sr, **kwargs)

plcmos.run(sample, sr, **kwargs)
  • sample is one of the following:

    • For AECMOS: dictionary of the form {'lpb': lpb, 'mic': mic, 'enh': enh} corresponding to the loopback, microphone, and enhanced audio as type np.ndarray or paths to audio files of type supported by librosa.
    • For DNSMOS and PLCMOS: np.ndarray or a path to an audio file of type supported by librosa.

    All audio should be single channel (mono) audio.
    Alternatively, sample can be a list of items of one of the above types.

  • sr denotes the sampling rate. Sampling rate should be either 16000 or 48000. AECMOS is available at 48kHz, all other models are available at 16kHz. All audio should be provided at the correct sampling rate.

For AECMOS:

  • talk_type specifies the scenario: 'st' (far-end single talk), 'nst' (near-end single talk), or 'dt' (double talk) if known. talk_type can be None in which case the 16kHz scenarioless model will be used. The performance is about 2% lower in correlation with the ground truth than the scenario based model.

For DNSMOS:

  • model_type controls which DNSMOS model to use: 'dnsmos' or 'dnsmos_personalized'. The default is 'dnsmos'.

Additional arguments:

  • return_df controls whether a pandas dataframe is returned containing sample information and MOS scores when evaluating a list of samples. The default is return_df = True. If set to False, a list of dictionaries is returned instead.
  • verbose controls whether more details are printed on the screen. The default is verbose = False.

Usage examples:

AECMOS usage example with sample as a dictionary of numpy arrays and unknown talk_type.

import librosa
from speechmos import aecmos

lpb, _ = librosa.load("d:/data/example/lpb.wav", sr=16000)
mic, _ = librosa.load("d:/data/example/mic.wav", sr=16000)
enh, _ = librosa.load("d:/data/example/enh.wav", sr=16000)

sample = {'lpb': lpb, 'mic': mic, 'enh': enh}

aecmos.run(sample, sr= 16000, verbose= True)

Output:

Model version aecmos_scenarioless_16kHz.
The model sampling rate is 16000.
{'echo_mos': 4.9999470710754395, 'deg_mos': 3.4854962825775146, 'talk_type': None, 'model_name': 'aecmos_scenarioless_16kHz'}

AECMOS usage example with sample as a list of dictionaries of paths to audio files.

from speechmos import aecmos
aecmos.run(sample_list, sr=48000, 'dt', verbose = True)

Output:

Using model aecmos_48kHz to evaluate 3 samples.
Model sampling rate is 48000.
0it [00:00, ?it/s]
1it [00:00,  8.59it/s]
3it [00:00, 25.77it/s]
{'lpb_path': 'D:/data/example/lpb.wav', 'mic_path': 'D:/data/example/mic.wav', 'enh_path': 'D:/data/example/enh.wav', 'echo_mos': 3.2400383949279785, 'deg_mos': 3.4087774753570557, 'talk_type': 'dt', 'model_name': 'aecmos_48kHz'}
{'lpb_path': 'D:/data/example/lpb.wav', 'mic_path': 'D:/data/example/mic.wav', 'enh_path': 'D:/data/example/enh.wav', 'echo_mos': 3.2400383949279785, 'deg_mos': 3.4087774753570557, 'talk_type': 'dt', 'model_name': 'aecmos_48kHz'}
{'lpb_path': 'D:/data/example/lpb.wav', 'mic_path': 'D:/data/example/mic.wav', 'enh_path': 'D:/data/example/enh.wav', 'echo_mos': 3.2400383949279785, 'deg_mos': 3.4087774753570557, 'talk_type': 'dt', 'model_name': 'aecmos_48kHz'}
       echo_mos   deg_mos
count  3.000000  3.000000
mean   3.240038  3.408777
std    0.000000  0.000000
min    3.240038  3.408777
25%    3.240038  3.408777
50%    3.240038  3.408777
75%    3.240038  3.408777
max    3.240038  3.408777

DNSMOS usage example with sample as a numpy array:

import librosa
from speechmos import dnsmos

audio, _ = librosa.load("D:/data/example/enh.wav", sr=16000)
dnsmos.run(audio, sr=16000)

Output:

{'filename': 'D:/data/example/enh.wav',
 'ovrl_mos': 2.2067626609880104,
 'sig_mos': 3.290418848414798,
 'bak_mos': 2.141338429075571,
 'p808_mos': 3.0722866}

PLCMOS usage example with sample as a path to an audio file:

import librosa
from speechmos import plcmos

plcmos.run("D:/data/example/enh.wav", sr=16000)

Output:

{'filename': 'D:/data/example/enh.wav',
 'plcmos': 2.5210512320200604,
 'model': 'plcmos_v2'}

Citation:

C. K. A. Reddy, V. Gopal and R. Cutler, "Dnsmos P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 886-890, doi: 10.1109/ICASSP43922.2022.9746108.

L. Diener, M. Purin, S. Sootla, A. Saabas, R. Aichner, and R. Cutler, "PLCMOS--a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms." arXiv preprint arXiv:2305.15127 (2023).

M. Purin, S. Sootla, M. Sponza, A. Saabas and R. Cutler, "AECMOS: A Speech Quality Assessment Metric for Echo Impairment," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 901-905, doi: 10.1109/ICASSP43922.2022.9747836.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmos-0.0.1.tar.gz (9.4 MB view details)

Uploaded Source

Built Distribution

speechmos-0.0.1-py3-none-any.whl (9.4 MB view details)

Uploaded Python 3

File details

Details for the file speechmos-0.0.1.tar.gz.

File metadata

  • Download URL: speechmos-0.0.1.tar.gz
  • Upload date:
  • Size: 9.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for speechmos-0.0.1.tar.gz
Algorithm Hash digest
SHA256 e46844463b27d9197c26dc25c016c45e773bef02564fea0413f67dd4829d46d6
MD5 6149d87a45ef71bc883c5f15f56fa193
BLAKE2b-256 4e4493c8adc6595dbe0c4bfac668b06667235d2fbf83d4911c2defdb1f68f126

See more details on using hashes here.

File details

Details for the file speechmos-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: speechmos-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for speechmos-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cd90c4669936b032726303db5978545c854dc2b8e43318c4a04aac7276c5fe82
MD5 4bca24d1c08b104fcfa1e6831edfe171
BLAKE2b-256 502a8bc8870e57c48d1a75601466fadb8d955c51f9f019bcecb0e7ba3e523801

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page