Skip to main content

Common loaders for MIR datasets.

Project description

mirdata

common loaders for Music Information Retrieval (MIR) datasets. Find the API documentation here.

CircleCI codecov Documentation Status GitHub Readme Score

This library provides tools for working with common MIR datasets, including tools for:

  • downloading datasets to a common location and format
  • validating that the files for a dataset are all present
  • loading annotation files to a common format, consistent with the format required by mir_eval
  • parsing track level metadata for detailed evaluations

Installation

To install, simply run:

pip install mirdata

Try it out!

import mirdata
import random

orchset = mirdata.Dataset('orchset')
orchset.download()  # download the dataset
orchset.validate()  # validate that all the expected files are there

example_track = orchset.choice_track()  # choose a random example track
print(example_track)  # see the availalbe data

See the Examples section below for more details, or the documentation for more examples and the API reference.

Currently supported datasets

For more information about these datasets see this table.

Reference

This library was presented in the following paper:

"mirdata: Software for Reproducible Usage of Datasets"
Rachel M. Bittner, Magdalena Fuentes, David Rubinstein, Andreas Jansson, Keunwoo Choi, and Thor Kell
in International Society for Music Information Retrieval (ISMIR) Conference, 2019
@inproceedings{
  bittner_fuentes_2019,
  title={mirdata: Software for Reproducible Usage of Datasets},
  author={Bittner, Rachel M and Fuentes, Magdalena and Rubinstein, David and Jansson, Andreas and Choi, Keunwoo and Kell, Thor},
  booktitle={International Society for Music Information Retrieval (ISMIR) Conference},
  year={2019}
}

Contributing a new dataset loader

We welcome contributions to this library, especially new datasets. Please see CONTRIBUTING.md for guidelines.

Examples

Download the Orchset Dataset

import mirdata

orchset = mirdata.Dataset('orchset')
orchset.download()

Validate the data

import mirdata

orchset = mirdata.Dataset('orchset')
orchset.validate()

Load data for a specific track

import mirdata

orchset = mirdata.Dataset('orchset')
track = orchset.track('Beethoven-S3-I-ex1')
print(track)

Load all tracks in the Orchset Dataset

import mirdata

orchset = mirdata.Dataset('orchset')
orchset_data = orchset.load_tracks()

See what data are available for a track

import mirdata

orchset = mirdata.Dataset('orchset')
orchset_ids = orchset.track_ids()
orchset_data = orchset.load_tracks()

example_track = orchset_data[orchset_ids[0]]
print(example_track)
> orchset.Track(
    track_id='Beethoven-S3-I-ex1',
    melody=F0Data(times=array([0.000e+00, 1.000e-02, 2.000e-02, ..., 1.244e+01, 1.245e+01, 1.246e+01]),
                  frequencies=array([  0.   ,   0.   ,   0.   , ..., 391.995, 391.995, 391.995]),
                  confidence=array([0, 0, 0, ..., 1, 1, 1])),
    audio_path_mono='~/mir_datasets/Orchset/audio/mono/Beethoven-S3-I-ex1.wav',
    audio_path_stereo='~/mir_datasets/Orchset/audio/stereo/Beethoven-S3-I-ex1.wav',
    composer='Beethoven',
    work='S3-I',
    excerpt='1',
    predominant_melodic_instruments=['winds', 'strings'],
    alternating_melody=True,
    contains_winds=True,
    contains_strings=True,
    contains_brass=False,
    only_strings=False,
    only_winds=False,
    only_brass=False
)

Evaluate a melody extraction algorithm on Orchset

import mir_eval
import mirdata
import numpy as np
import sox

def very_bad_melody_extractor(audio_path):
    duration = sox.file_info.duration(audio_path)
    time_stamps = np.arange(0, duration, 0.01)
    melody_f0 = np.random.uniform(low=80.0, high=800.0, size=time_stamps.shape)
    return time_stamps, melody_f0

# Evaluate on the full dataset
orchset_scores = {}
orchset = mirdata.Dataset('orchset')
orchset_data = orchset.load_tracks()
for track_id, track_data in orchset_data.items():
    est_times, est_freqs = very_bad_melody_extractor(track_data.audio_path_mono)

    ref_times = track_data.melody.times
    ref_freqs = track_data.melody.frequencies

    score = mir_eval.melody.evaluate(ref_times, ref_freqs, est_times, est_freqs)
    orchset_scores[track_id] = score

# Split the results by composer and by instrumentation
composer_scores = {}
strings_no_strings_scores = {True: {}, False: {}}
for track_id, track_data in orchset_data.items():
    if track_data.composer not in composer_scores.keys():
        composer_scores[track_data.composer] = {}

    composer_scores[track_data.composer][track_id] = orchset_scores[track_id]
    strings_no_strings_scores[track_data.contains_strings][track_id] = \
        orchset_scores[track_id]

Dataset Location

By default, all datasets tracked by this library are stored in ~/mir_datasets, (defined as MIR_DATASETS_DIR in mirdata/__init__.py). Data can alternatively be stored in another location by specifying data_home within a relevant function, e.g. mirdata.Dataset('orchset', data_home='my_custom_path')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mirdata-0.3.0b0.tar.gz (79.4 kB view details)

Uploaded Source

Built Distribution

mirdata-0.3.0b0-py3-none-any.whl (114.4 kB view details)

Uploaded Python 3

File details

Details for the file mirdata-0.3.0b0.tar.gz.

File metadata

  • Download URL: mirdata-0.3.0b0.tar.gz
  • Upload date:
  • Size: 79.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0

File hashes

Hashes for mirdata-0.3.0b0.tar.gz
Algorithm Hash digest
SHA256 8febb3c4c9b44f522a44730a23fd9b7762cdfd0adbbaac13f952c4efde35d3e8
MD5 018d89ce6c4a49e229ba5b8ec9e728ae
BLAKE2b-256 8af552a2d02c1a057334d8a1ebf9a13df7c254194fd96e7a8d730e8014da1a66

See more details on using hashes here.

File details

Details for the file mirdata-0.3.0b0-py3-none-any.whl.

File metadata

  • Download URL: mirdata-0.3.0b0-py3-none-any.whl
  • Upload date:
  • Size: 114.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0

File hashes

Hashes for mirdata-0.3.0b0-py3-none-any.whl
Algorithm Hash digest
SHA256 97ff558a787fc4c1e446ca48b93cebd847759ccf0783e6ed7c2c1ce5ca1237c4
MD5 0742ff633b357bc9a8a0bcf8a7eb2942
BLAKE2b-256 d77da6e192604ac44bdeb347dff3b4dd54021d921fd85a97d08f1e983ca0ff39

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page