Skip to main content

MOS (Mean Opinion Score) models for evaluating audio quality.

Project description

AECMOS, DNSMOS, PLCMOS

  • We release the AECMOS, DNSMOS, and PLCMOS models that we have developed for evaluating audio degradations due to echo, noise, packet loss and other sources.

Prerequisites

  • Python 3.7 and above
  • librosa 0.9.1
  • numpy 1.21.5
  • onnxruntime 1.10.0
  • pandas
  • tqdm

Usage:

from speechmos import aecmos, dnsmos, plcmos

aecmos.run(sample, sr, talk_type, **kwargs)

dnsmos.run(sample, sr, **kwargs)

plcmos.run(sample, sr, **kwargs)
  • sample is one of the following:

    • For AECMOS: dictionary of the form {'lpb': lpb, 'mic': mic, 'enh': enh} corresponding to the loopback, microphone, and enhanced audio as type np.ndarray or paths to audio files of type supported by librosa.
    • For DNSMOS and PLCMOS: np.ndarray or a path to an audio file of type supported by librosa.

    All audio should be single channel (mono) audio.
    Alternatively, sample can be a list of items of one of the above types.

  • sr denotes the sampling rate. Sampling rate should be either 16000 or 48000. AECMOS is available at 48kHz, all other models are available at 16kHz. All audio should be provided at the correct sampling rate.

For AECMOS:

  • talk_type specifies the scenario: 'st' (far-end single talk), 'nst' (near-end single talk), or 'dt' (double talk) if known. talk_type can be None in which case the 16kHz scenarioless model can be used. The performance is about 2% lower in correlation with the ground truth than the scenario based model.

For DNSMOS:

  • model_type controls which DNSMOS model to use: 'dnsmos' or 'dnsmos_personalized'. The default is 'dnsmos'.

Additional arguments:

  • return_df controls whether a pandas dataframe is returned containing sample information and MOS scores when evaluating a list of samples. The default is return_df = True. If set to False, a list of dictionaries is returned instead.
  • verbose controls whether more details are printed on the screen. The default is verbose = False.

Usage examples:

AECMOS usage example with sample as a dictionary of numpy arrays and unknown talk_type.

import librosa
from speechmos import aecmos

lpb, _ = librosa.load("d:/data/example/lpb.wav", sr=16000)
mic, _ = librosa.load("d:/data/example/mic.wav", sr=16000)
enh, _ = librosa.load("d:/data/example/enh.wav", sr=16000)

sample = {'lpb': lpb, 'mic': mic, 'enh': enh}

aecmos.run(sample, sr= 16000, verbose= True)

Output:

Model version aecmos_scenarioless_16kHz.
The model sampling rate is 16000.
{'echo_mos': 4.9999470710754395, 'deg_mos': 3.4854962825775146, 'talk_type': None, 'model_name': 'aecmos_scenarioless_16kHz'}

AECMOS usage example with sample as a list of dictionaries of paths to audio files.

from speechmos import aecmos
aecmos.run(sample_list, sr=48000, 'dt', verbose = True)

Output:

Using model aecmos_48kHz to evaluate 3 samples.
Model sampling rate is 48000.
0it [00:00, ?it/s]
1it [00:00,  8.59it/s]
3it [00:00, 25.77it/s]
{'lpb_path': 'D:/data/example/lpb.wav', 'mic_path': 'D:/data/example/mic.wav', 'enh_path': 'D:/data/example/enh.wav', 'echo_mos': 3.2400383949279785, 'deg_mos': 3.4087774753570557, 'talk_type': 'dt', 'model_name': 'aecmos_48kHz'}
{'lpb_path': 'D:/data/example/lpb.wav', 'mic_path': 'D:/data/example/mic.wav', 'enh_path': 'D:/data/example/enh.wav', 'echo_mos': 3.2400383949279785, 'deg_mos': 3.4087774753570557, 'talk_type': 'dt', 'model_name': 'aecmos_48kHz'}
{'lpb_path': 'D:/data/example/lpb.wav', 'mic_path': 'D:/data/example/mic.wav', 'enh_path': 'D:/data/example/enh.wav', 'echo_mos': 3.2400383949279785, 'deg_mos': 3.4087774753570557, 'talk_type': 'dt', 'model_name': 'aecmos_48kHz'}
       echo_mos   deg_mos
count  3.000000  3.000000
mean   3.240038  3.408777
std    0.000000  0.000000
min    3.240038  3.408777
25%    3.240038  3.408777
50%    3.240038  3.408777
75%    3.240038  3.408777
max    3.240038  3.408777

DNSMOS usage example with sample as a numpy array:

import librosa
from speechmos import dnsmos

audio, _ = librosa.load("D:/data/example/enh.wav", sr=16000)
dnsmos.run(audio, sr=16000)

Output:

{'filename': 'D:/data/example/enh.wav',
 'ovrl_mos': 2.2067626609880104,
 'sig_mos': 3.290418848414798,
 'bak_mos': 2.141338429075571,
 'p808_mos': 3.0722866}

PLCMOS usage example with sample as a path to an audio file:

import librosa
from speechmos import plcmos

plcmos.run("D:/data/example/enh.wav", sr=16000)

Output:

{'filename': 'D:/data/example/enh.wav',
 'plcmos': 2.5210512320200604,
 'model': 'plcmos_v2'}

Citation:

C. K. A. Reddy, V. Gopal and R. Cutler, "Dnsmos P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 886-890, doi: 10.1109/ICASSP43922.2022.9746108.

L. Diener, M. Purin, S. Sootla, A. Saabas, R. Aichner, and R. Cutler, "PLCMOS--a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms." arXiv preprint arXiv:2305.15127 (2023).

M. Purin, S. Sootla, M. Sponza, A. Saabas and R. Cutler, "AECMOS: A Speech Quality Assessment Metric for Echo Impairment," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 901-905, doi: 10.1109/ICASSP43922.2022.9747836.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmos-0.0.1.1.tar.gz (9.4 MB view details)

Uploaded Source

Built Distribution

speechmos-0.0.1.1-py3-none-any.whl (9.4 MB view details)

Uploaded Python 3

File details

Details for the file speechmos-0.0.1.1.tar.gz.

File metadata

  • Download URL: speechmos-0.0.1.1.tar.gz
  • Upload date:
  • Size: 9.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for speechmos-0.0.1.1.tar.gz
Algorithm Hash digest
SHA256 f65040b408a5114b808fba8abcbd22aa09dc77e60bea0c2d2b060efebca53dec
MD5 a9e5ccad5faf0df8aff2e753368c75e9
BLAKE2b-256 1e03b9f7fb53094b7919feb7a37d0dbc445f20276cff0743f518cd8d2726074a

See more details on using hashes here.

File details

Details for the file speechmos-0.0.1.1-py3-none-any.whl.

File metadata

  • Download URL: speechmos-0.0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for speechmos-0.0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 31c4c9d3234f6ee10102edff74333014c50006a3f389daf7ecacae34e68ebbf7
MD5 8e2e17753b417b1e60a6e51abc50c9d4
BLAKE2b-256 300e08369f3574447acfe78a7678f9a5e2b9c6629888be25a5e2407d616a6c02

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page