A Python API that enables data consumers and distributors to easily use and share datasets, and establishes a standard for exchanging data assets.
Project description
ParData (homophone of partake) is a Python API that enables data consumers and distributors to easily use and share datasets, and establishes a standard for exchanging data assets. It enables:
a data scientist to have a simpler and more unified way to begin working with a wide range of datasets, and
a data distributor to have a consistent, safe, and open source way to share datasets with interested communities.
Install the Package & its Dependencies
To install the latest version of ParData, run
$ pip install pardata
Alternatively, if you have downloaded the source, switch to the source directory (same directory as this README file, cd /path/to/pardata-source) and run
$ pip install -U .
Quick Start
Import the package and load a dataset. ParData will download WikiText-103 dataset (version 1.0.1) if it’s not already downloaded, and then load it.
import pardata
wikitext103_data = pardata.load_dataset('wikitext103')
View available ParData datasets and their versions.
>>> pardata.list_all_datasets()
{'claim_sentences_search': ('1.0.2',), ..., 'wikitext103': ('1.0.1',)}
To view your globally set configs for ParData, such as your default data directory, use pardata.get_config.
>>> pardata.get_config()
Config(DATADIR=PosixPath('dir/to/download/load/from'), ..., DATASET_SCHEMA_FILE_URL='file/to/load/datasets/from')
By default, pardata.load_dataset downloads to and loads from ~/.pardata/data/<dataset-name>/<dataset-version>/. To change the default data directory, use pardata.init.
pardata.init(DATADIR='new/dir/to/download/load/from')
Load a previously downloaded dataset using pardata.load_dataset. With the new default data dir set, ParData now searches for the Groningen Meaning Bank dataset (version 1.0.2) in new/dir/to/download/load/from/gmb/1.0.2/.
gmb_data = load_dataset('gmb', version='1.0.2', download=False) # assuming GMB dataset was already downloaded
To learn more about ParData, check out the documentation and the tutorial.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pardata-0.4.0.tar.gz
.
File metadata
- Download URL: pardata-0.4.0.tar.gz
- Upload date:
- Size: 13.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b310b6e256e56092a8218126eec8021cc4cf204e44a36309054ac2e9dd78b80f |
|
MD5 | 0879874a1175e459e5e5c707e35697f0 |
|
BLAKE2b-256 | c8a87b809d0e0048a9cb3d8ea3f5635f9039eb38712f30c8971fff514efa0bd0 |
File details
Details for the file pardata-0.4.0-py3-none-any.whl
.
File metadata
- Download URL: pardata-0.4.0-py3-none-any.whl
- Upload date:
- Size: 45.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc6d8561937fbe3879a3c03dbb062a41b2c214749e5b538193ae849c4267984b |
|
MD5 | 59fcbb8d401fc2bef257c1a2698ed058 |
|
BLAKE2b-256 | a0be1fb95324e172cc8e19119e3e5d7a983df402206d61a0a584e1179e894150 |