Skip to main content

Manage and automatize datasets for data science projects.

Project description

Dataset Manager

Manage and automatize your datasets for your project with YAML files.

Build Status

Current Support: Python 2.7Python 3.4Python 3.5Python 3.6Python 3.7Python 3.8

How it Works

This project create a file called identifier.yaml in your dataset directory with these fields:

source: https://raw.githubusercontent.com/pcsanwald/kaggle-titanic/master/train.csv

description: this dataset is a test dataset

format: csv

identifier: is the identifier for dataset reference is the file name with yaml extension.

source: is location from dataset.

description: describe your dataset to remember later.

Each dataset is a YAML file inside dataset directory.

Installing

With pip just:

pip install dataset_manager

With conda:

conda install dataset_manager

Using

You can manage your datasets with a list of commands and integrate with Pandas or other data analysis tool.

List all Datasets

Return a List with all Datasets from dataset path

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.list_datasets()

Get one Dataset

Get Dataset line as dict

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.get_dataset(identifier)

Create a Dataset

Create a Dataset with every information you want inside dataset_path defined.

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.create_dataset(identifier, source, description, **kwargs)

Remove a Dataset

Remove Dataset from dataset_path

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.remove_dataset(identifier)

Contributing

Just make pull request and be happy!

Let's grow together ;)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataset_manager-0.0.12.tar.gz (7.1 kB view details)

Uploaded Source

Built Distributions

dataset_manager-0.0.12-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

dataset_manager-0.0.12-py2-none-any.whl (12.6 kB view details)

Uploaded Python 2

File details

Details for the file dataset_manager-0.0.12.tar.gz.

File metadata

  • Download URL: dataset_manager-0.0.12.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6

File hashes

Hashes for dataset_manager-0.0.12.tar.gz
Algorithm Hash digest
SHA256 c6ff3637e9b7a8127c15fcbdd875437ff51ab7a75b4ac575019ee725ea052332
MD5 170ba98f2ee012a5c48e89a30b0e54f2
BLAKE2b-256 b64336c2ea02dff4199e417b8b9533746530fd22bdf88969c4adbbec14de7b11

See more details on using hashes here.

File details

Details for the file dataset_manager-0.0.12-py3-none-any.whl.

File metadata

  • Download URL: dataset_manager-0.0.12-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6

File hashes

Hashes for dataset_manager-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 607dedd530b3a659ed0134744bb1556e09e3e26152a149ba5dfe702a60d3a9bc
MD5 2040fa4667446bb71a515cb8cc223311
BLAKE2b-256 114cffea5330a21681bd81f8023b7feecbe6cf6941721de9f4c425f905e69170

See more details on using hashes here.

File details

Details for the file dataset_manager-0.0.12-py2-none-any.whl.

File metadata

  • Download URL: dataset_manager-0.0.12-py2-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6

File hashes

Hashes for dataset_manager-0.0.12-py2-none-any.whl
Algorithm Hash digest
SHA256 83549affadd8170ef8672a26747fca941045408897cdb44e8dce20513961b6f0
MD5 291535a0bbd7580c2e4ace587bf2de78
BLAKE2b-256 47d9731af1497b40a07f595b179af48a8ea08c3fc95bbcb9adbf5ec54b955898

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page