Common loaders for MIR datasets.
Project description
mirdata
common loaders for Music Information Retrieval (MIR) datasets. Find the API documentation here.
This library provides tools for working with common MIR datasets, including tools for:
- downloading datasets to a common location and format
- validating that the files for a dataset are all present
- loading annotation files to a common format, consistent with the format required by mir_eval
- parsing track level metadata for detailed evaluations
Installation
To install, simply run:
pip install mirdata
Try it out!
import mirdata
import random
orchset = mirdata.Dataset('orchset')
orchset.download() # download the dataset
orchset.validate() # validate that all the expected files are there
example_track = orchset.choice_track() # choose a random example track
print(example_track) # see the availalbe data
See the Examples section below for more details, or the documentation for more examples and the API reference.
Currently supported datasets
- Beatles
- Beatport EDM key
- DALI
- GiantSteps tempo
- GiantSteps key
- Groove MIDI
- GTZAN genre
- GuitarSet
- Ikala
- MAESTRO
- MedleyDB Melody
- MedleyDB Pitch
- Medley-solos-DB
- Mridangam Stroke
- ORCHSET
- RWC Classical
- RWC Jazz
- RWC Popular
- Salami
- TinySOL
For more information about these datasets see this table.
Reference
This library was presented in the following paper:
"mirdata: Software for Reproducible Usage of Datasets"
Rachel M. Bittner, Magdalena Fuentes, David Rubinstein, Andreas Jansson, Keunwoo Choi, and Thor Kell
in International Society for Music Information Retrieval (ISMIR) Conference, 2019
@inproceedings{
bittner_fuentes_2019,
title={mirdata: Software for Reproducible Usage of Datasets},
author={Bittner, Rachel M and Fuentes, Magdalena and Rubinstein, David and Jansson, Andreas and Choi, Keunwoo and Kell, Thor},
booktitle={International Society for Music Information Retrieval (ISMIR) Conference},
year={2019}
}
Contributing a new dataset loader
We welcome contributions to this library, especially new datasets. Please see CONTRIBUTING.md for guidelines.
Examples
Download the Orchset Dataset
import mirdata
orchset = mirdata.Dataset('orchset')
orchset.download()
Validate the data
import mirdata
orchset = mirdata.Dataset('orchset')
orchset.validate()
Load data for a specific track
import mirdata
orchset = mirdata.Dataset('orchset')
track = orchset.track('Beethoven-S3-I-ex1')
print(track)
Load all tracks in the Orchset Dataset
import mirdata
orchset = mirdata.Dataset('orchset')
orchset_data = orchset.load_tracks()
See what data are available for a track
import mirdata
orchset = mirdata.Dataset('orchset')
orchset_ids = orchset.track_ids()
orchset_data = orchset.load_tracks()
example_track = orchset_data[orchset_ids[0]]
print(example_track)
> orchset.Track(
track_id='Beethoven-S3-I-ex1',
melody=F0Data(times=array([0.000e+00, 1.000e-02, 2.000e-02, ..., 1.244e+01, 1.245e+01, 1.246e+01]),
frequencies=array([ 0. , 0. , 0. , ..., 391.995, 391.995, 391.995]),
confidence=array([0, 0, 0, ..., 1, 1, 1])),
audio_path_mono='~/mir_datasets/Orchset/audio/mono/Beethoven-S3-I-ex1.wav',
audio_path_stereo='~/mir_datasets/Orchset/audio/stereo/Beethoven-S3-I-ex1.wav',
composer='Beethoven',
work='S3-I',
excerpt='1',
predominant_melodic_instruments=['winds', 'strings'],
alternating_melody=True,
contains_winds=True,
contains_strings=True,
contains_brass=False,
only_strings=False,
only_winds=False,
only_brass=False
)
Evaluate a melody extraction algorithm on Orchset
import mir_eval
import mirdata
import numpy as np
import sox
def very_bad_melody_extractor(audio_path):
duration = sox.file_info.duration(audio_path)
time_stamps = np.arange(0, duration, 0.01)
melody_f0 = np.random.uniform(low=80.0, high=800.0, size=time_stamps.shape)
return time_stamps, melody_f0
# Evaluate on the full dataset
orchset_scores = {}
orchset = mirdata.Dataset('orchset')
orchset_data = orchset.load_tracks()
for track_id, track_data in orchset_data.items():
est_times, est_freqs = very_bad_melody_extractor(track_data.audio_path_mono)
ref_times = track_data.melody.times
ref_freqs = track_data.melody.frequencies
score = mir_eval.melody.evaluate(ref_times, ref_freqs, est_times, est_freqs)
orchset_scores[track_id] = score
# Split the results by composer and by instrumentation
composer_scores = {}
strings_no_strings_scores = {True: {}, False: {}}
for track_id, track_data in orchset_data.items():
if track_data.composer not in composer_scores.keys():
composer_scores[track_data.composer] = {}
composer_scores[track_data.composer][track_id] = orchset_scores[track_id]
strings_no_strings_scores[track_data.contains_strings][track_id] = \
orchset_scores[track_id]
Dataset Location
By default, all datasets tracked by this library are stored in ~/mir_datasets
,
(defined as MIR_DATASETS_DIR
in mirdata/__init__.py
).
Data can alternatively be stored in another location by specifying data_home
within a relevant function, e.g. mirdata.Dataset('orchset', data_home='my_custom_path')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mirdata-0.3.0b0.tar.gz
.
File metadata
- Download URL: mirdata-0.3.0b0.tar.gz
- Upload date:
- Size: 79.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8febb3c4c9b44f522a44730a23fd9b7762cdfd0adbbaac13f952c4efde35d3e8 |
|
MD5 | 018d89ce6c4a49e229ba5b8ec9e728ae |
|
BLAKE2b-256 | 8af552a2d02c1a057334d8a1ebf9a13df7c254194fd96e7a8d730e8014da1a66 |
File details
Details for the file mirdata-0.3.0b0-py3-none-any.whl
.
File metadata
- Download URL: mirdata-0.3.0b0-py3-none-any.whl
- Upload date:
- Size: 114.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 97ff558a787fc4c1e446ca48b93cebd847759ccf0783e6ed7c2c1ce5ca1237c4 |
|
MD5 | 0742ff633b357bc9a8a0bcf8a7eb2942 |
|
BLAKE2b-256 | d77da6e192604ac44bdeb347dff3b4dd54021d921fd85a97d08f1e983ca0ff39 |