Skip to main content

Manage and automatize datasets for data science projects.

Project description

Dataset Manager

Manage and automatize your datasets for your project with YAML files.

Build Status

Current Support: Python 2.7Python 3.4Python 3.5Python 3.6Python 3.7Python 3.8

How it Works

This project create a file called identifier.yaml in your dataset directory with these fields:

source: https://raw.githubusercontent.com/pcsanwald/kaggle-titanic/master/train.csv

description: this dataset is a test dataset

format: csv

identifier: is the identifier for dataset reference is the file name with yaml extension.

source: is location from dataset.

description: describe your dataset to remember later.

Each dataset is a YAML file inside dataset directory.

Installing

With pip just:

pip install dataset_manager

With conda:

conda install dataset_manager

Using

You can manage your datasets with a list of commands and integrate with Pandas or other data analysis tool.

List all Datasets

Return a List with all Datasets from dataset path

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.list_datasets()

Get one Dataset

Get Dataset line as dict

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.get_dataset(identifier)

Create a Dataset

Create a Dataset with every information you want inside dataset_path defined.

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.create_dataset(identifier, source, description, **kwargs)

Remove a Dataset

Remove Dataset from dataset_path

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.remove_dataset(identifier)

Contributing

Just make pull request and be happy!

Let's grow together ;)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataset_manager-0.0.10.tar.gz (3.4 kB view details)

Uploaded Source

Built Distributions

dataset_manager-0.0.10-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

dataset_manager-0.0.10-py2-none-any.whl (12.3 kB view details)

Uploaded Python 2

File details

Details for the file dataset_manager-0.0.10.tar.gz.

File metadata

  • Download URL: dataset_manager-0.0.10.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for dataset_manager-0.0.10.tar.gz
Algorithm Hash digest
SHA256 55c90701e51bb01a028739b389fe1a6eb33794ea1d2e2c9c22b5c2d9f11d8276
MD5 b3860ce2c9b0645024e60fbfd44aa22c
BLAKE2b-256 6034a2d59bf29445e40906f1601e51e00337dd9baf820740653876f84b957492

See more details on using hashes here.

File details

Details for the file dataset_manager-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: dataset_manager-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for dataset_manager-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 18f6dc5565f0b5bd9fcc3fce5b65f97bd6276298810a79f9b4a4bae41054ca0d
MD5 20614c8e676433bf96eb38bcf7b982cb
BLAKE2b-256 c2d662c30dd13f2f07f20644bdc7596b65d04905bb9fbb007873ddffb9bf490a

See more details on using hashes here.

File details

Details for the file dataset_manager-0.0.10-py2-none-any.whl.

File metadata

  • Download URL: dataset_manager-0.0.10-py2-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6

File hashes

Hashes for dataset_manager-0.0.10-py2-none-any.whl
Algorithm Hash digest
SHA256 3f67de0d089fdb499650d32465a3a16a53a9194786db49f28c128f06cfbbd327
MD5 ac225e1b5eb4b0e8cf875535e2e2a519
BLAKE2b-256 4e9b4da05d28e529a2f7974135886cf00a7a7a68905022c78ec8e0a1bf8ce252

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page