Skip to main content

DataLad extension for semantic metadata handling

Project description

DataLad extension for semantic metadata handling

Build_status codecov.io GitHub release PyPI version fury.io Documentation

Overview

This software is a DataLad extension that equips DataLad with an alternative command suite for metadata handling (extraction, aggregation, filtering, and reporting).

Command(s) currently provided by this extension

  • meta-extract -- run an extractor on a file or dataset and emit the resulting metadata (stdout).

  • meta-filter -- run an filter over existing metadata and return the resulting metadata (stdout).

  • meta-add -- add a metadata record or a list of metadata records (possibly received on stdin) to a metadata store, usually to the git-repo of the dataset.

  • meta-aggregate -- aggregate metadata from multiple local or remote metadata-stores into a local metadata store.

  • meta-dump -- reporting metadata from local or remote metadata stores. Allows to select metadata by file- or dataset-path matching patterns including dataset versions and dataset IDs.

  • meta conduct -- execute processing pipelines that consist of a provider which emits objects that should be processed, e.g. files or metadata, and a pipeline of processors, that perform operations on the provided objects, such as metadata-extraction and metadata-adding.Processors are usually executed in parallel. A few pipeline definitions are provided with the release.

Commands currently under development:

  • meta-export -- write a flat representation of metadata to a file-system. For now you can export your metadata to a JSON-lines file named metadata-dump.jsonl:

     datalad meta-dump -d <dataset-path> -r >metadata-dump.jsonl
    
  • meta-import -- import a flat representation of metadata from a file-system. For now you can import metadata from a JSON-lines file, e.g. metadata-dump.jsonl like this:

     datalad meta-add -d <dataset-path> --json-lines -i metadata-dump.jsonl
    
  • meta-ingest-previous -- ingest metadata from metalad<=0.2.1.

Additional metadata extractor implementations

  • Compatible with the previous families of extractors provided by datalad and by metalad, i.e. metalad_core, metalad_annex, metalad_custom, metalad_runprov

  • New metadata extractor paradigm that distinguishes between file- and dataset-level extractors. Included are two example extractors, metalad_example_dataset, and metalad_example_file

  • metalad_external_dataset and metalad_external_file, a dataset- and a file-extractors that execute external processes to generate metadata allow processing of the externally created metadata in datalad.

  • metalad_studyminimeta -- a dataset-level extractor that reads studyminimeta yaml files and produces metadata that contains a JSON-LD compatible description of the data in the input file

Indexers

  • Provides indexers for the new datalad indexer-plugin interface. These indexers convert metadata in proprietary formats into a set of key-value pairs that can be used by datalad search to search for content.

  • indexer_studyminimeta -- converts studyminimeta JSON-LD description into key-value pairs for datalad search.

  • indexer_jsonld -- a generic JSON-LD indexer that aims at converting any JSON-LD descriptions into a set of key-value pairs that reflect the content of the JSON-LD description.

Installation

Before you install this package, please make sure that you install a recent version of git-annex. Afterwards, install the latest version of datalad-metalad from PyPi. It is recommended to use a dedicated virtualenv:

# create and enter a new virtual environment (strongly recommended)
virtualenv --system-site-packages --python=python3 ~/env/datalad
. ~/env/datalad/bin/activate

# install from PyPi
pip install datalad-metalad

Support

For general information on how to use or contribute to DataLad (and this extension), please see the DataLad website or the main GitHub project page. The documentation is found here: http://docs.datalad.org/projects/metalad

All bugs, concerns and enhancement requests for this software can be submitted here: https://github.com/datalad/datalad-metalad/issues

If you have a problem or would like to ask a question about how to use DataLad, please submit a question to NeuroStars.org with a datalad tag. NeuroStars.org is a platform similar to StackOverflow but dedicated to neuroinformatics.

All previous DataLad questions are available here: http://neurostars.org/tags/datalad/

Acknowledgements

This DataLad extension was developed with support from the German Federal Ministry of Education and Research (BMBF 01GQ1905), and the US National Science Foundation (NSF 1912266).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datalad_metalad-0.4.19.tar.gz (198.5 kB view details)

Uploaded Source

Built Distribution

datalad_metalad-0.4.19-py3-none-any.whl (228.6 kB view details)

Uploaded Python 3

File details

Details for the file datalad_metalad-0.4.19.tar.gz.

File metadata

  • Download URL: datalad_metalad-0.4.19.tar.gz
  • Upload date:
  • Size: 198.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for datalad_metalad-0.4.19.tar.gz
Algorithm Hash digest
SHA256 c8e1c724c49dab25d66e254d2254d127e4cac985b270e76516a26c59236591d7
MD5 bf5c294630d708345c80cb4c2157e54e
BLAKE2b-256 e44319bb079f7c61b876ee9ec695ea26639e57ddc012f8297a104a1f1c30e184

See more details on using hashes here.

File details

Details for the file datalad_metalad-0.4.19-py3-none-any.whl.

File metadata

File hashes

Hashes for datalad_metalad-0.4.19-py3-none-any.whl
Algorithm Hash digest
SHA256 d2ae06dfd26926c74afda00c0ed89407f3de4790d617f85133e4e0973005f8e5
MD5 70f52f6b11f3299309ec48e83b6f1af5
BLAKE2b-256 12644d16be5763a1c9f1c8ee38a329106ef2392cccd3f89d0953e3b8dee881ce

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page