Project description

Tests

Welcome to @datasets

TODO

import pandas as pd
from metaflow import FlowSpec, step

from datasets import Dataset, Mode
from datasets.metaflow import DatasetParameter
from datasets.plugins import BatchOptions


# Can also invoke from CLI:
#  > python datasets/tutorials/0_hello_dataset_flow.py run \
#    --hello_dataset '{"name": "HelloDataset", "mode": "READ_WRITE", \
#    "options": {"type": "BatchOptions", "partition_by": "region"}}'
class HelloDatasetFlow(FlowSpec):
    hello_dataset = DatasetParameter(
        "hello_dataset",
        default=Dataset("HelloDataset", mode=Mode.READ_WRITE, options=BatchOptions(partition_by="region")),
    )

    @step
    def start(self):
        df = pd.DataFrame({"region": ["A", "A", "A", "B", "B", "B"], "zpid": [1, 2, 3, 4, 5, 6]})
        print("saving data_frame: \n", df.to_string(index=False))

        # Example of writing to a dataset
        self.hello_dataset.write(df)

        # save this as an output dataset
        self.output_dataset = self.hello_dataset

        self.next(self.end)

    @step
    def end(self):
        print(f"I have dataset \n{self.output_dataset=}")

        # output_dataset to_pandas(partitions=dict(region="A")) only
        df: pd.DataFrame = self.output_dataset.to_pandas(partitions=dict(region="A"))
        print('self.output_dataset.to_pandas(partitions=dict(region="A")):')
        print(df.to_string(index=False))


if __name__ == "__main__":
    HelloDatasetFlow()

Project details

These details have not been verified by PyPI

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

1.2.5

Jun 26, 2023

0.2.5

Apr 27, 2023

0.2.4

Dec 8, 2022

0.2.3

Dec 8, 2022

0.2.2

Nov 17, 2022

0.2.1

Oct 6, 2022

0.2.0

Oct 5, 2022

This version

0.1.3

Oct 5, 2022

0.1.2

Aug 20, 2022

0.1.1

May 31, 2022

0.0.11

May 23, 2022

0.0.10

May 11, 2022

0.0.8.dev2 pre-release

Apr 27, 2022

0.0.4

Dec 5, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zdatasets-0.1.3.tar.gz (54.6 kB view hashes)

Uploaded Oct 5, 2022 Source

Built Distribution

zdatasets-0.1.3-py3-none-any.whl (83.8 kB view hashes)

Uploaded Oct 5, 2022 Python 3

Hashes for zdatasets-0.1.3.tar.gz

Hashes for zdatasets-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`cb72f62781cca3726b3d696e534046e8e7281dffbac17c1ec7164fbcc2fd8728`
MD5	`771c53a175727ca6bec34b5cf28d7e23`
BLAKE2b-256	`1e64cf823d6d46f7ac38069732782c9b20e900b33feba9953813142bc285aafa`

Hashes for zdatasets-0.1.3-py3-none-any.whl

Hashes for zdatasets-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5294c3b7b22b1ca858a0703accc9f19a4de6370848548999277c8d76d1ebbd28`
MD5	`729174ef90190098765680b83b740d5d`
BLAKE2b-256	`6d4e43480113b3c11ea45da1881b4ae33174afb397d873a0e5fb378bdeb12bf8`