MOS (Mean Opinion Score) models for evaluating audio quality.
Project description
AECMOS, DNSMOS, PLCMOS
- We release the AECMOS, DNSMOS, and PLCMOS models that we have developed for evaluating audio degradations due to echo, noise, packet loss and other sources.
Prerequisites
- Python 3.7 and above
- librosa 0.9.1
- numpy 1.21.5
- onnxruntime 1.10.0
- pandas
- tqdm
Usage:
from speechmos import aecmos, dnsmos, plcmos
aecmos.run(sample, sr, talk_type, **kwargs)
dnsmos.run(sample, sr, **kwargs)
plcmos.run(sample, sr, **kwargs)
-
sample
is one of the following:- For AECMOS: dictionary of the form
{'lpb': lpb, 'mic': mic, 'enh': enh}
corresponding to the loopback, microphone, and enhanced audio as typenp.ndarray
or paths to audio files of type supported bylibrosa
. - For DNSMOS and PLCMOS:
np.ndarray
or a path to an audio file of type supported bylibrosa
.
All audio should be single channel (mono) audio.
Alternatively,sample
can be a list of items of one of the above types. - For AECMOS: dictionary of the form
-
sr
denotes the sampling rate. Sampling rate should be either 16000 or 48000. AECMOS is available at 48kHz, all other models are available at 16kHz. All audio should be provided at the correct sampling rate.
For AECMOS:
talk_type
specifies the scenario:'st'
(far-end single talk),'nst'
(near-end single talk), or'dt'
(double talk) if known.talk_type
can beNone
in which case the 16kHz scenarioless model will be used. The performance is about 2% lower in correlation with the ground truth than the scenario based model.
For DNSMOS:
model_type
controls which DNSMOS model to use:'dnsmos'
or'dnsmos_personalized'
. The default is'dnsmos'
.
Additional arguments:
return_df
controls whether a pandas dataframe is returned containing sample information and MOS scores when evaluating a list of samples. The default isreturn_df = True
. If set toFalse
, a list of dictionaries is returned instead.verbose
controls whether more details are printed on the screen. The default isverbose = False
.
Usage examples:
AECMOS usage example with sample
as a dictionary of numpy arrays and unknown talk_type
.
import librosa
from speechmos import aecmos
lpb, _ = librosa.load("d:/data/example/lpb.wav", sr=16000)
mic, _ = librosa.load("d:/data/example/mic.wav", sr=16000)
enh, _ = librosa.load("d:/data/example/enh.wav", sr=16000)
sample = {'lpb': lpb, 'mic': mic, 'enh': enh}
aecmos.run(sample, sr= 16000, verbose= True)
Output:
Model version aecmos_scenarioless_16kHz.
The model sampling rate is 16000.
{'echo_mos': 4.9999470710754395, 'deg_mos': 3.4854962825775146, 'talk_type': None, 'model_name': 'aecmos_scenarioless_16kHz'}
AECMOS usage example with sample
as a list of dictionaries of paths to audio files.
from speechmos import aecmos
aecmos.run(sample_list, sr=48000, 'dt', verbose = True)
Output:
Using model aecmos_48kHz to evaluate 3 samples.
Model sampling rate is 48000.
0it [00:00, ?it/s]
1it [00:00, 8.59it/s]
3it [00:00, 25.77it/s]
{'lpb_path': 'D:/data/example/lpb.wav', 'mic_path': 'D:/data/example/mic.wav', 'enh_path': 'D:/data/example/enh.wav', 'echo_mos': 3.2400383949279785, 'deg_mos': 3.4087774753570557, 'talk_type': 'dt', 'model_name': 'aecmos_48kHz'}
{'lpb_path': 'D:/data/example/lpb.wav', 'mic_path': 'D:/data/example/mic.wav', 'enh_path': 'D:/data/example/enh.wav', 'echo_mos': 3.2400383949279785, 'deg_mos': 3.4087774753570557, 'talk_type': 'dt', 'model_name': 'aecmos_48kHz'}
{'lpb_path': 'D:/data/example/lpb.wav', 'mic_path': 'D:/data/example/mic.wav', 'enh_path': 'D:/data/example/enh.wav', 'echo_mos': 3.2400383949279785, 'deg_mos': 3.4087774753570557, 'talk_type': 'dt', 'model_name': 'aecmos_48kHz'}
echo_mos deg_mos
count 3.000000 3.000000
mean 3.240038 3.408777
std 0.000000 0.000000
min 3.240038 3.408777
25% 3.240038 3.408777
50% 3.240038 3.408777
75% 3.240038 3.408777
max 3.240038 3.408777
DNSMOS usage example with sample
as a numpy array:
import librosa
from speechmos import dnsmos
audio, _ = librosa.load("D:/data/example/enh.wav", sr=16000)
dnsmos.run(audio, sr=16000)
Output:
{'filename': 'D:/data/example/enh.wav',
'ovrl_mos': 2.2067626609880104,
'sig_mos': 3.290418848414798,
'bak_mos': 2.141338429075571,
'p808_mos': 3.0722866}
PLCMOS usage example with sample
as a path to an audio file:
import librosa
from speechmos import plcmos
plcmos.run("D:/data/example/enh.wav", sr=16000)
Output:
{'filename': 'D:/data/example/enh.wav',
'plcmos': 2.5210512320200604,
'model': 'plcmos_v2'}
Citation:
C. K. A. Reddy, V. Gopal and R. Cutler, "Dnsmos P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 886-890, doi: 10.1109/ICASSP43922.2022.9746108.
L. Diener, M. Purin, S. Sootla, A. Saabas, R. Aichner, and R. Cutler, "PLCMOS--a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms." arXiv preprint arXiv:2305.15127 (2023).
M. Purin, S. Sootla, M. Sponza, A. Saabas and R. Cutler, "AECMOS: A Speech Quality Assessment Metric for Echo Impairment," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 901-905, doi: 10.1109/ICASSP43922.2022.9747836.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file speechmos-0.0.1.tar.gz
.
File metadata
- Download URL: speechmos-0.0.1.tar.gz
- Upload date:
- Size: 9.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e46844463b27d9197c26dc25c016c45e773bef02564fea0413f67dd4829d46d6 |
|
MD5 | 6149d87a45ef71bc883c5f15f56fa193 |
|
BLAKE2b-256 | 4e4493c8adc6595dbe0c4bfac668b06667235d2fbf83d4911c2defdb1f68f126 |
File details
Details for the file speechmos-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: speechmos-0.0.1-py3-none-any.whl
- Upload date:
- Size: 9.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd90c4669936b032726303db5978545c854dc2b8e43318c4a04aac7276c5fe82 |
|
MD5 | 4bca24d1c08b104fcfa1e6831edfe171 |
|
BLAKE2b-256 | 502a8bc8870e57c48d1a75601466fadb8d955c51f9f019bcecb0e7ba3e523801 |