A library for abstracting versioned metadata storage for data packages
Project description
metastore-lib: metadata storage library for datapackages
A Python library for abstracting metadata storage for datapackage.json packages.
Full Documentation
While this README provides some basic information on how to get started, the
most up-to-date and comprehesive documentation for mestastore-lib
is
available at metastore-lib.readthedocs.io.
Installation
The easiest way to install the latest stable version of metastore-lib into
your Python environment is via pip
:
pip install metastore-lib
Quick Start
Instantiating a backend
To use the library after you have installed it, first instantiate a storage instance:
config = {"token": "...",
"more_options": "..."}
# Using the provided factory method
metastore = create_metastore('github', **config)
# Or by directly instantiating one of the MetaStoreBackend classes:
metastore = GitHubStorage(**config)
Storing a dataset (creating a new package)
Then use the storage instance to store a dataset:
import json
with open("datapackage.json") as f:
metadata = json.loads(f)
package_info = metastore.create(package_id, metadata)
This will store the package metadata using the specific storage backend. For
example, in the case of the GitHub backend, a new repository will be created
with a corresponding datapackage.json
file and LFS pointer files for
resources.
The returned package_info
will be an object with some information about
the stored package revision:
class PackageRevisionInfo:
package_id: str = "..."
revision: str = "..."
package: Dict = {"name": "mypackage",
"version": "1.0.0",
"resources": [
# ...
]}
Updating a dataset
To update the same package:
base_rev = package_info.revision
metadata['version'] = '1.0.1'
package_info = metastore.update(package_id, metadata, base_revision=base_rev)
This will update the package, creating a new revision of the metadata. Note that
base_revision
is not required but is recommended, to ensure changes are not
conflicting; Specifying base_revision
will ensure you are changing based on
the latest revision of the package, and if not a ConflictException
will be
raised.
Listing Dataset Revisions
Now you can get a list of all revisions of the package (there should be exactly two):
revisions = metastore.revision_list(package_id)
# Returns: [ <RevisionInfo rev2>, <RevisionInfo rev1> ]
Each returned object in the list represents a single revision:
class PackageRevisionInfo:
package_id: str = "..."
revision: str = "..."
created: datetime = ... # the revision creation timestamp
Fetching a Dataset Revision
Now that we have two different revisions of the dataset, we can fetch a specific revision of the metadata:
package_info = metastore.fetch(package_id, revision=revisions[0].revision)
print(f"{package_info.package['name']} {package_info.package['version']}")
# will output: mypackage 1.0.0
package_info = metastore.fetch(package_id, revision=revisions[1].revision)
print(f"{package_info.package['name']} {package_info.package['version']}")
# will output: mypackage 1.0.1
This returns a RevisionInfo
object for the requested package / revision.
Note that the revision
parameter is optional, and if omitted the latest
revision will be fetched.
Creating a Tag
Once a revision has been created, you can tag the revision to give it a meaningful name:
tag_info = metastore.tag_create(package_id,
revision=revisions[1].revision,
name='ver-1.0.1')
This will return a new TagInfo
object, with the name
attribute set to
'ver-1.0.1'
.
Listing Tags
To get a list of all tags for a package:
tags = metastore.tag_list(package_id)
This will return a list of TagInfo
objects, each pointing to a specific
tagged revision.
A Note on Package Identifiers
Package Identifiers (e.g. the package_id
in the example above) are strings
and are, as far as metastore
is concerned, opaque. However, they may still
be meaningful as far as either the backend or the client is concerned.
For example, with a GitHub based backend you will use IDs that correlate with
<org name>/<repo name>
structure.
Other backends may expect you to use UUID type identifiers.
It is up to the code using the metastore
library to be able to compose the
right identifiers.
Using the Filesystem Backend for Testing
For testing and quick prototyping purposes, this library offers a special
filesystem
backend, which can be used to save versioned datapackage
information on the file system, in memory or on virtual file system.
This backend is based on the PyFilesystem library, and can use any of it's supported file systems as storage.
In testing, it is recommended to use a memory based storage:
from metastore.backend.filesystem import FilesystemStorage
def test_my_code():
"""Test for code that relies on a metastore-lib backend
"""
backend = FilesystemStorage('mem://')
r1 = backend.create('some-package', datapackage, 'Initial revision')
# ... continue with testing ...
The FilesystemStorage
constructor takes a single argument, which is a
PyFilesystem
root filesystem URL.
Beyond this, all API is exactly the same as with other backends.
License
Copyright (C) 2020, Viderum, Inc.
metastore-lib is free / open source software and is distributed under the terms of the MIT license. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file metastore-lib-0.2.0.tar.gz
.
File metadata
- Download URL: metastore-lib-0.2.0.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.1.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/2.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc8fd9d8b4efd3e234567b941bf59a52f96af4fa93caf08b3ca8d9bd750664de |
|
MD5 | aef29a0c1282701b4b9c837ecae17344 |
|
BLAKE2b-256 | ae171329aba3b662cc2d4af75b777838a1c925cedab92251032fd10c41f4a2ab |
File details
Details for the file metastore_lib-0.2.0-py2.py3-none-any.whl
.
File metadata
- Download URL: metastore_lib-0.2.0-py2.py3-none-any.whl
- Upload date:
- Size: 23.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.1.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/2.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3c692b962b71f5fa1278ece28175be08b395c1c92d774ace2ca61f276bda572 |
|
MD5 | 79c7540462efa7734a42a86d8a958aa9 |
|
BLAKE2b-256 | d017b2947e8d6a48d0341611c9106428339aef59c463cd7c78c5bc3639476f32 |