Skip to main content

A library for abstracting versioned metadata storage for data packages

Project description

metastore-lib: metadata storage library for datapackages

Build Status Maintainability Test Coverage PyPI version Documentation Status

A Python library for abstracting metadata storage for datapackage.json packages.

Full Documentation

While this README provides some basic information on how to get started, the most up-to-date and comprehesive documentation for mestastore-lib is available at metastore-lib.readthedocs.io.

Installation

The easiest way to install the latest stable version of metastore-lib into your Python environment is via pip:

pip install metastore-lib

Quick Start

Instantiating a backend

To use the library after you have installed it, first instantiate a storage instance:

config = {"token": "...",
          "more_options": "..."}

# Using the provided factory method
metastore = create_metastore('github', **config)

# Or by directly instantiating one of the MetaStoreBackend classes:
metastore = GitHubStorage(**config)

Storing a dataset (creating a new package)

Then use the storage instance to store a dataset:

import json

with open("datapackage.json") as f:
    metadata = json.loads(f)

package_info = metastore.create(package_id, metadata)

This will store the package metadata using the specific storage backend. For example, in the case of the GitHub backend, a new repository will be created with a corresponding datapackage.json file and LFS pointer files for resources.

The returned package_info will be an object with some information about the stored package revision:

class PackageRevisionInfo:
    package_id: str = "..."
    revision: str = "..."
    package: Dict = {"name": "mypackage",
                     "version": "1.0.0",    
                     "resources": [
                       # ...
                     ]}

Updating a dataset

To update the same package:

base_rev = package_info.revision
metadata['version'] = '1.0.1'
package_info = metastore.update(package_id, metadata, base_revision=base_rev)

This will update the package, creating a new revision of the metadata. Note that base_revision is not required but is recommended, to ensure changes are not conflicting; Specifying base_revision will ensure you are changing based on the latest revision of the package, and if not a ConflictException will be raised.

Listing Dataset Revisions

Now you can get a list of all revisions of the package (there should be exactly two):

revisions = metastore.revision_list(package_id)
# Returns: [ <RevisionInfo rev2>, <RevisionInfo rev1> ]

Each returned object in the list represents a single revision:

class PackageRevisionInfo:
    package_id: str = "..."
    revision: str = "..."
    created: datetime = ... # the revision creation timestamp

Fetching a Dataset Revision

Now that we have two different revisions of the dataset, we can fetch a specific revision of the metadata:

package_info = metastore.fetch(package_id, revision=revisions[0].revision)
print(f"{package_info.package['name']} {package_info.package['version']}")
# will output: mypackage 1.0.0

package_info = metastore.fetch(package_id, revision=revisions[1].revision)
print(f"{package_info.package['name']} {package_info.package['version']}")
# will output: mypackage 1.0.1

This returns a RevisionInfo object for the requested package / revision.

Note that the revision parameter is optional, and if omitted the latest revision will be fetched.

Creating a Tag

Once a revision has been created, you can tag the revision to give it a meaningful name:

tag_info = metastore.tag_create(package_id, 
                                revision=revisions[1].revision, 
                                name='ver-1.0.1')

This will return a new TagInfo object, with the name attribute set to 'ver-1.0.1'.

Listing Tags

To get a list of all tags for a package:

tags = metastore.tag_list(package_id)

This will return a list of TagInfo objects, each pointing to a specific tagged revision.

A Note on Package Identifiers

Package Identifiers (e.g. the package_id in the example above) are strings and are, as far as metastore is concerned, opaque. However, they may still be meaningful as far as either the backend or the client is concerned.

For example, with a GitHub based backend you will use IDs that correlate with <org name>/<repo name> structure.

Other backends may expect you to use UUID type identifiers.

It is up to the code using the metastore library to be able to compose the right identifiers.

Using the Filesystem Backend for Testing

For testing and quick prototyping purposes, this library offers a special filesystem backend, which can be used to save versioned datapackage information on the file system, in memory or on virtual file system.

This backend is based on the PyFilesystem library, and can use any of it's supported file systems as storage.

In testing, it is recommended to use a memory based storage:

from metastore.backend.filesystem import FilesystemStorage

def test_my_code():
    """Test for code that relies on a metastore-lib backend
    """
    backend = FilesystemStorage('mem://')
    r1 = backend.create('some-package', datapackage, 'Initial revision') 
    # ... continue with testing ...

The FilesystemStorage constructor takes a single argument, which is a PyFilesystem root filesystem URL.

Beyond this, all API is exactly the same as with other backends.

License

Copyright (C) 2020, Viderum, Inc.

metastore-lib is free / open source software and is distributed under the terms of the MIT license. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metastore-lib-0.1.3.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

metastore_lib-0.1.3-py2.py3-none-any.whl (23.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file metastore-lib-0.1.3.tar.gz.

File metadata

  • Download URL: metastore-lib-0.1.3.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.1.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/2.7.15

File hashes

Hashes for metastore-lib-0.1.3.tar.gz
Algorithm Hash digest
SHA256 b9ce41e1d3fd110f7f272be8f5f5967d31542b1ffefa2f503337ca9486bf6ffa
MD5 994dbc4e9ee0cb5dc76274161851eaef
BLAKE2b-256 e0c9b7cce20d2d13f7b2623500adce29fba020d8df0a5744bf7d43bcbd984957

See more details on using hashes here.

File details

Details for the file metastore_lib-0.1.3-py2.py3-none-any.whl.

File metadata

  • Download URL: metastore_lib-0.1.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.1.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/2.7.15

File hashes

Hashes for metastore_lib-0.1.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 12d1d90a2a8340369940c658daa197d56930f0d805df5dfaeac9898e859f6d4b
MD5 d7c206d746ad3a8599bb5a5628c3fdf2
BLAKE2b-256 ebbcf11cdbf91e8498bbb78c320e1141554bd648dacc9f5aee602184d78d1b94

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page