Skip to main content

Manage and automatize datasets for data science projects.

Project description

Dataset Manager

Manage and automatize your datasets for your project with YAML files.

Build Status

Current Support: Python 2.7Python 3.4Python 3.5Python 3.6Python 3.7Python 3.8

How it Works

This project create a file called identifier.yaml in your dataset directory with these fields:

source: https://raw.githubusercontent.com/pcsanwald/kaggle-titanic/master/train.csv

description: this dataset is a test dataset

format: csv

identifier: is the identifier for dataset reference is the file name with yaml extension.

source: is location from dataset.

description: describe your dataset to remember later.

Each dataset is a YAML file inside dataset directory.

Installing

With pip just:

pip install dataset_manager

With conda:

conda install dataset_manager

Using

You can manage your datasets with a list of commands and integrate with Pandas or other data analysis tool.

List all Datasets

Return a List with all Datasets from dataset path

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.list_datasets()

Get one Dataset

Get Dataset source

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.get_dataset(identifier)

Create a Dataset

Create a Dataset with every information you want inside dataset_path defined.

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.create_dataset(identifier, source, description, **kwargs)

Remove a Dataset

Remove Dataset from dataset_path

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.remove_dataset(identifier)

Contributing

Just make pull request and be happy!

Let's grow together ;)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataset_manager-0.0.9.tar.gz (3.3 kB view details)

Uploaded Source

Built Distribution

dataset_manager-0.0.9-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file dataset_manager-0.0.9.tar.gz.

File metadata

  • Download URL: dataset_manager-0.0.9.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for dataset_manager-0.0.9.tar.gz
Algorithm Hash digest
SHA256 f0a72c5e2284b963dbbf7eb3af9cefdde62e591314fec261f56d8cc947161528
MD5 f2e943092fe3ecff9f2ec05957938ef7
BLAKE2b-256 70475784e6157f62d080aff1c293a8b0bf38cc7cbda31dc6a567a783e411fea8

See more details on using hashes here.

File details

Details for the file dataset_manager-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: dataset_manager-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for dataset_manager-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 58c7f54cf90ab46555a828919b4b0a293b1b93c2c1588815fedb63ad95ae2a7a
MD5 fd0ca3ca9f897f9a0b9a590266dc84a2
BLAKE2b-256 87a205267f0136cbf996e0d7b6f1ea7b77422ace98862c46f0f805df6015d060

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page