Skip to main content

Manage and automatize datasets for data science projects.

Project description

Dataset Manager

Manage and automatize your datasets for your project with YAML files high integrated with Pandas.

Build Status

Current Support: Python 2.7Python 3.4Python 3.5Python 3.6Python 3.7Python 3.8

How it Works

This project create a file called identifier.yaml in your dataset directory with these fields:

source: https://raw.githubusercontent.com/pcsanwald/kaggle-titanic/master/train.csv

description: this dataset is a test dataset

format: csv

identifier: is the identifier for dataset reference is the file name with yaml extension.

source: is location from dataset.

description: describe your dataset to remember later.

format: pandas read format following read_<format> as described here: https://pandas.pydata.org/pandas-docs/stable/reference/io.html.

Each dataset is a YAML file inside dataset directory.

Installing

With pip just:

pip install dataset_manager

With conda:

conda install dataset_manager

Using

You can manage your datasets with a list of commands and integrated with Pandas.

List all Datasets

Return a List with all datasets from dataset path

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.list_datasets()

Get one Dataset

Get dataset as Pandas DataFrame and accept Pandas read *args and **kwargs

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.get_dataset(identifier, *args, **kwargs)

Create a Dataset

Create a Dataset inside dataset_path defined

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.create_dataset(identifier, source, description, format_extension)

Remove a Dataset

Remove Dataset from dataset_path

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.remove_dataset(identifier)

Contributing

Just make pull request and be happy!

Let's grow together ;)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataset_manager-0.0.7.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

dataset_manager-0.0.7-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file dataset_manager-0.0.7.tar.gz.

File metadata

  • Download URL: dataset_manager-0.0.7.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for dataset_manager-0.0.7.tar.gz
Algorithm Hash digest
SHA256 d1d3db3719956782d97d124e55536b02ba6be0cc8d25c550f12027ffae4ed489
MD5 66b56bad153c013c2f9c5562770f21fe
BLAKE2b-256 9ee7c7744fc1c229ea7a546a3a5613734c3e7fae91d4e65d8fed1b9aca7045ee

See more details on using hashes here.

File details

Details for the file dataset_manager-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: dataset_manager-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for dataset_manager-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 b618139cc2fc5bbc0fdcb007e6d03cd2b189b1703fd127459fd7d6c17e9bd899
MD5 75b7357f9592687feafd256d61d063e9
BLAKE2b-256 59398ed1a3b8907d856093090307da4db4440fab60b46a47a49df26a4585fe5c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page