Skip to main content

cas-manifest allows developers to store artifacts in a _content-addressable_ store using a self-describing _manifest_

Project description

CAS-Manifest

This package facilitates storing artifacts in Content Addressable Storage via the hashfs library. In a CAS regime, the hash of the artifact's contents is used as the key.

It further requires that artifacts are pydantic models - this allows for stable serialization of the artifacts, and for data to be self-describing.

Consider an example usage profile: let's say that your application works with datasets, some of which are serialized as csv files, others of which are serialized as tsv files. Some have header rows, and some do not. Rather than write data-loading code that tries to infer the correct way to deserialize a dataset file, cas-manifest serializes all relevant attributes of the dataset along with the data file itself. Your code might look like this:

from hashfs import HashFS
from cas_manifest.registry import Registry
from my_classes import CSVDataset, TSVDataset

fs = HashFS('/path/to/data')
dataset_hash = '5fef4a'
registry = Registry(fs, [CSVDataset, TSVDataset])
obj = registry.load(dataset_hash)
# obj is an instance of either CSVDataset or TSVDataset

Why CAS?

In short, CAS enforces immutability. When using CAS, a key's contents can never be changed. The following comes naturally:

  • No more data_final__2_new files - all objects are uniquely specified
  • No cache invalidation - cache objects freely, knowing that their contents will never change upstream
  • No more provenance questions - models can be robustly linked to the datasets used to train them

Why manifests?

In a CAS regime, keys are deliberately opaque. By using manifests, artifacts can be self-descriptive. It can include instructions for deserialization, links to other artifacts, and any other metadata you can think up. In combination with CAS, you can ensure that your metadata and underlying data never go out of sync, since your metadata will refer to an immutable reference to underlying data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cas-manifest-0.3.1.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

cas_manifest-0.3.1-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file cas-manifest-0.3.1.tar.gz.

File metadata

  • Download URL: cas-manifest-0.3.1.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.2 CPython/3.7.7 Darwin/19.6.0

File hashes

Hashes for cas-manifest-0.3.1.tar.gz
Algorithm Hash digest
SHA256 3a1d7bb5ea25539da640c3d8f9c37fbdc161ea85ded7c07a4ce1ab67b93b7de5
MD5 afc2b14c59e6187c721f350b7587969e
BLAKE2b-256 1d07f72531c218383045354c05db5c573fea2abe7eec7322d1d91e82a2629b4c

See more details on using hashes here.

File details

Details for the file cas_manifest-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: cas_manifest-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.2 CPython/3.7.7 Darwin/19.6.0

File hashes

Hashes for cas_manifest-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9ad6f76f5da60fd144b95f2df5f01ee4651f709aaf78d5af9382f55de38c64b4
MD5 b7c297810cf4b40e8b116344ff8a70c0
BLAKE2b-256 1a48934c29aaef824804a4916d19feca4dc8c0dd39fd9a3b4d2af0e67936cf74

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page