Skip to main content

A Distributed DataFrame library for large scale complex data processing.

Project description

Daft dataframes can load any data such as PDF documents, images, protobufs, csv, parquet and audio files into a table dataframe structure for easy querying

Github Actions tests PyPI latest tag

WebsiteDocsInstallation10-minute tour of DaftCommunity and Support

Daft: the distributed Python dataframe for media data

Daft is a fast, Pythonic and scalable open-source dataframe library built for Python and Machine Learning workloads.

Daft is currently in its Alpha release phase - please expect bugs and rapid improvements to the project. We welcome user feedback/feature requests in our Discussions forums

Table of Contents

About Daft

The Daft dataframe is a table of data with rows and columns. Columns can contain any Python objects, which allows Daft to support rich media data types such as images, audio, video and more.

  1. Any Data: Columns can contain any Python objects, which means that the Python libraries you already use for running machine learning or custom data processing will work natively with Daft!

  2. Notebook Computing: Daft is built for the interactive developer experience on a notebook - intelligent caching/query optimizations accelerates your experimentation and data exploration.

  3. Distributed Computing: Rich media formats such as images can quickly outgrow your local laptop’s computational resources - Daft integrates natively with Ray for running dataframes on large clusters of machines with thousands of CPUs/GPUs.

Getting Started

Installation

Install Daft with pip install getdaft.

Quickstart

Check out our full quickstart tutorial!

Load a dataframe - in this example we load the MNIST dataset from a JSON file, but Daft also supports many other formats such as CSV, Parquet and folders/buckets of files.

from daft import DataFrame

URL = "https://github.com/Eventual-Inc/mnist-json/raw/master/mnist_handwritten_test.json.gz"

df = DataFrame.from_json(URL)
df.show(4)

dataframe of MNIST dataset with Python list of pixels

Filter the dataframe for rows where the "label" column is equal to 5

df = df.where(df["label"] == 5)
df.show(4)

dataframe of MNIST dataset filtered for rows where the label is the digit 5

Run any function on the dataframe (here we convert a list of pixels into an image using Numpy and the Pillow libraries)

import numpy as np
from PIL import Image

def image_from_pixel_list(pixels: list) -> Image.Image:
    arr = np.array(pixels).astype(np.uint8)
    return Image.fromarray(arr.reshape(28, 28))

df = df.with_column(
    "image_pil",
    df["image"].apply(image_from_pixel_list),
)
df.show(4)

dataframe of MNIST dataset with the Python list of pixel values converted to a Pillow image

More Resources

  • 10-minute tour of Daft - learn more about Daft’s full range of capabilities including dataloading from URLs, joins, user-defined functions (UDF), groupby, aggregations and more.

  • User Guide - take a deep-dive into each topic within Daft

  • API Reference - API reference for public classes/functions of Daft

License

Daft has an Apache 2.0 license - please see the LICENSE file.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

getdaft-0.0.17.tar.gz (143.3 kB view details)

Uploaded Source

Built Distributions

getdaft-0.0.17-cp310-cp310-manylinux_2_17_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

getdaft-0.0.17-cp310-cp310-macosx_11_0_x86_64.whl (283.3 kB view details)

Uploaded CPython 3.10 macOS 11.0+ x86-64

getdaft-0.0.17-cp310-cp310-macosx_11_0_arm64.whl (269.8 kB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

getdaft-0.0.17-cp39-cp39-manylinux_2_17_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

getdaft-0.0.17-cp39-cp39-macosx_11_0_x86_64.whl (283.7 kB view details)

Uploaded CPython 3.9 macOS 11.0+ x86-64

getdaft-0.0.17-cp39-cp39-macosx_11_0_arm64.whl (270.2 kB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

getdaft-0.0.17-cp38-cp38-manylinux_2_17_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

getdaft-0.0.17-cp38-cp38-macosx_11_0_arm64.whl (270.1 kB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

getdaft-0.0.17-cp38-cp38-macosx_10_16_x86_64.whl (283.8 kB view details)

Uploaded CPython 3.8 macOS 10.16+ x86-64

getdaft-0.0.17-cp37-cp37m-manylinux_2_17_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

getdaft-0.0.17-cp37-cp37m-macosx_10_16_x86_64.whl (283.2 kB view details)

Uploaded CPython 3.7m macOS 10.16+ x86-64

File details

Details for the file getdaft-0.0.17.tar.gz.

File metadata

  • Download URL: getdaft-0.0.17.tar.gz
  • Upload date:
  • Size: 143.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for getdaft-0.0.17.tar.gz
Algorithm Hash digest
SHA256 20af53e9934f9d2cee3ce888acdb1144fda3d0ab050f3879b85d9b1ab087447a
MD5 f42f584e29d515a472c2e0121da31f5c
BLAKE2b-256 c5a79c03a03c083e65b84a7e1704d3d20081c6ebc1675040dfeffe4f1aa2cb34

See more details on using hashes here.

File details

Details for the file getdaft-0.0.17-cp310-cp310-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.17-cp310-cp310-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 422ef68ab67b3ffb979aa5d9b36bce107ea553dcf255f7a09db6b1c73e178b50
MD5 89fb3803f333af2d91443c0036976b4b
BLAKE2b-256 a4b4fb5bbe3b6dd66bc1a876dd3ddcfb197d8e1e5012a8e0b20f391479a0779a

See more details on using hashes here.

File details

Details for the file getdaft-0.0.17-cp310-cp310-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.17-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 f6c946c7bfcdf23d31aef4cd990dbd27224723064f7cd82bfd01aac0a9f787c9
MD5 209305847672da07bf0ec2498f17fd2f
BLAKE2b-256 36a4996a4b4e2534354b27fa85a31a28dfedf0b1d7c3d1fd08f05f4fb7000320

See more details on using hashes here.

File details

Details for the file getdaft-0.0.17-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.17-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ce97986d69680409627fc9ab12705d46103c465ff7c3e94cb12ca15da44772db
MD5 a987ac1c5dc041a2f12ee7251765cc79
BLAKE2b-256 db4ed60464d120f920becd1152493bca8bb9c9dea0246f7cfa454003c829bf87

See more details on using hashes here.

File details

Details for the file getdaft-0.0.17-cp39-cp39-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.17-cp39-cp39-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 f47e35054e9eb1e3da2cd71f90ebc25f8264fe358da8d1170f9883b8c01d769d
MD5 49b0c11d5203c9f3a77d3ee5842ebe70
BLAKE2b-256 83ceec88db2c41bf04b9d05bf35e9f092bcf35951768a9f656a7f894cd704701

See more details on using hashes here.

File details

Details for the file getdaft-0.0.17-cp39-cp39-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.17-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 a4f7d397211bbe59761e22ca02f61733a08f085fabf1de8512bf4ef5f07d6f19
MD5 02ef901473349fc2aa8b860fa6bfef21
BLAKE2b-256 e7fd841d2a60bc15518d4110c8f4a0e9b6a66c22881144d58d3d82ccb332d5a5

See more details on using hashes here.

File details

Details for the file getdaft-0.0.17-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.17-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c7b177abcd9616918f59f3fd6be9bcba5f698561c494192ffc8346c85fe3a4b5
MD5 adaca98dd472976372c6524f40c94190
BLAKE2b-256 12369cf5987eeacbc86bf470eda9395f4bd6271f39c5055064ae2623ba83fcd5

See more details on using hashes here.

File details

Details for the file getdaft-0.0.17-cp38-cp38-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.17-cp38-cp38-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 a18e3342c1a6e383e9c85d5d1b9558c4a1321d1430c57a95a2a71731f2fff95b
MD5 7189f338569d5fdabf046843e41fc36a
BLAKE2b-256 702f7e111e0d3ac3fab3c68dd1f26657d85defde70ad29a64257c1e48c515ce0

See more details on using hashes here.

File details

Details for the file getdaft-0.0.17-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.17-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b0a03342e80cddcefe33371e1b5e5e3efe54a4df1ad17d8ff921ef7755a0661a
MD5 dae84919530231edd0a0a4c12cb1a44b
BLAKE2b-256 e118da705840bfb2c6be34b2fe591820a818d8529f64a1a2e82d1660a4b41da7

See more details on using hashes here.

File details

Details for the file getdaft-0.0.17-cp38-cp38-macosx_10_16_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.17-cp38-cp38-macosx_10_16_x86_64.whl
Algorithm Hash digest
SHA256 dfc2993e1aa0acc4446fa09f3dbc1c983d886963aa1ea072bf4133c7a00633c8
MD5 aafee4fca51dc5c79c5228dd07ac9fa0
BLAKE2b-256 f2303818db5a2a563bf5ac8da4559fe4a552db4bbdecdffe304b28ae473ea0a1

See more details on using hashes here.

File details

Details for the file getdaft-0.0.17-cp37-cp37m-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.17-cp37-cp37m-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 f0734196e3355408344fc87a6069ae66bc166e138f77e366d85de064f091ac2b
MD5 c91a591d9f314204403c0163b114e22d
BLAKE2b-256 ef8667c043667f4e82b814a633ed040a58efb3c3ac1c56b3966234e46c4ebfb9

See more details on using hashes here.

File details

Details for the file getdaft-0.0.17-cp37-cp37m-macosx_10_16_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.17-cp37-cp37m-macosx_10_16_x86_64.whl
Algorithm Hash digest
SHA256 5b476513a880887f4f755aa62eb5af91d59b956786b1d93fef7be769ac408e98
MD5 5b528bb569c58366da9ae56311a60ff9
BLAKE2b-256 eec2365db91cf077b42e6b57195d7b1093e2437d0bca42bf9456b159dddbe574

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page