Skip to main content

A Distributed DataFrame library for large scale complex data processing.

Project description

Daft dataframes can load any data such as PDF documents, images, protobufs, csv, parquet and audio files into a table dataframe structure for easy querying

Github Actions tests PyPI latest tag

WebsiteDocsInstallation10-minute tour of DaftCommunity and Support

Daft: the distributed Python dataframe for complex data

Daft is a fast, Pythonic and scalable open-source dataframe library built for Python and Machine Learning workloads.

Daft is currently in its Alpha release phase - please expect bugs and rapid improvements to the project. We welcome user feedback/feature requests in our Discussions forums

Table of Contents

About Daft

The Daft dataframe is a table of data with rows and columns. Columns can contain any Python objects, which allows Daft to support rich complex data types such as images, audio, video and more.

  1. Any Data: Columns can contain any Python objects, which means that the Python libraries you already use for running machine learning or custom data processing will work natively with Daft!

  2. Notebook Computing: Daft is built for the interactive developer experience on a notebook - intelligent caching/query optimizations accelerates your experimentation and data exploration.

  3. Distributed Computing: Rich complex formats such as images can quickly outgrow your local laptop’s computational resources - Daft integrates natively with Ray for running dataframes on large clusters of machines with thousands of CPUs/GPUs.

Getting Started

Installation

Install Daft with pip install getdaft.

Quickstart

Check out our full quickstart tutorial!

In this example, we load images from an AWS S3 bucket and run a simple function to generate thumbnails for each image:

from daft import DataFrame, lit

import io
from PIL import Image

def get_thumbnail(img: Image.Image) -> Image.Image:
    """Simple function to make an image thumbnail"""
    imgcopy = img.copy()
    imgcopy.thumbnail((48, 48))
    return imgcopy

# Load a dataframe from files in an S3 bucket
df = DataFrame.from_files("s3://daft-public-data/laion-sample-images/*")

# Get the AWS S3 url of each image
df = df.select(lit("s3://").str.concat(df["name"]).alias("s3_url"))

# Download images and load as a PIL Image object
df = df.with_column("image", df["s3_url"].url.download().apply(lambda data: Image.open(io.BytesIO(data))))

# Generate thumbnails from images
df = df.with_column("thumbnail", df["image"].apply(get_thumbnail))

df.show(3)

Dataframe code to load a folder of images from AWS S3 and create thumbnails

More Resources

  • 10-minute tour of Daft - learn more about Daft’s full range of capabilities including dataloading from URLs, joins, user-defined functions (UDF), groupby, aggregations and more.

  • User Guide - take a deep-dive into each topic within Daft

  • API Reference - API reference for public classes/functions of Daft

License

Daft has an Apache 2.0 license - please see the LICENSE file.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

getdaft-0.0.19.tar.gz (146.3 kB view details)

Uploaded Source

Built Distributions

getdaft-0.0.19-cp310-cp310-manylinux_2_17_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

getdaft-0.0.19-cp310-cp310-macosx_11_0_x86_64.whl (286.8 kB view details)

Uploaded CPython 3.10 macOS 11.0+ x86-64

getdaft-0.0.19-cp310-cp310-macosx_11_0_arm64.whl (273.3 kB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

getdaft-0.0.19-cp39-cp39-manylinux_2_17_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

getdaft-0.0.19-cp39-cp39-macosx_11_0_x86_64.whl (287.2 kB view details)

Uploaded CPython 3.9 macOS 11.0+ x86-64

getdaft-0.0.19-cp39-cp39-macosx_11_0_arm64.whl (273.7 kB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

getdaft-0.0.19-cp38-cp38-manylinux_2_17_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

getdaft-0.0.19-cp38-cp38-macosx_11_0_arm64.whl (273.6 kB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

getdaft-0.0.19-cp38-cp38-macosx_10_16_x86_64.whl (287.3 kB view details)

Uploaded CPython 3.8 macOS 10.16+ x86-64

getdaft-0.0.19-cp37-cp37m-manylinux_2_17_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

getdaft-0.0.19-cp37-cp37m-macosx_10_16_x86_64.whl (286.7 kB view details)

Uploaded CPython 3.7m macOS 10.16+ x86-64

File details

Details for the file getdaft-0.0.19.tar.gz.

File metadata

  • Download URL: getdaft-0.0.19.tar.gz
  • Upload date:
  • Size: 146.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for getdaft-0.0.19.tar.gz
Algorithm Hash digest
SHA256 c8f73ccff9a47285074cb5648e394375662df8e6e9c3c8ceb30936d88035f33b
MD5 748ca556e60a3519d33224df779abbd5
BLAKE2b-256 f96f0a16032ad43ff5795e135f04f00c3b59c8a4b3f421027aab0ea18dd67c86

See more details on using hashes here.

File details

Details for the file getdaft-0.0.19-cp310-cp310-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.19-cp310-cp310-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 f11496f3f21ee9bbbb693aa452c633ab7091476fa63f21e8a3b5bba6f6a64340
MD5 cffae3212fa3780a2d9d6a3cfe0ec571
BLAKE2b-256 7a2b75a979e3d9ceed178906af389a7cc4f1410e988ea628f2532979ae826762

See more details on using hashes here.

File details

Details for the file getdaft-0.0.19-cp310-cp310-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.19-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 8cc23a63049d5caa513ff3c579cb162bdfda474e1ed42b10bd37014058cd46c8
MD5 d3326e2fcfb3f88bd90bc02e7272b1bd
BLAKE2b-256 3e03abbf5c8545e76219c6a4c3593790e0ad69baccd79fb6578dc31a7b8b8f24

See more details on using hashes here.

File details

Details for the file getdaft-0.0.19-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.19-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 111e1af84865f946a685f225be7f6fbf389df680136839ee5290f19e7c899981
MD5 5238d153aae0f4382dfc7757f28a59c5
BLAKE2b-256 c2abb68412841b4d5b7a96a06bb3d2d7377a8e6057992a0edc8e66fdc6f33746

See more details on using hashes here.

File details

Details for the file getdaft-0.0.19-cp39-cp39-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.19-cp39-cp39-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 6feec5a974276babd8905293db5608cc4f1ebf764083343b95c0ae5a116fbbff
MD5 f9921c1fddf03c8e138a1565ab40e375
BLAKE2b-256 5a87178cb4fbde60b9cc44efe3f42a372b9265259bfb23ee8f369a252a24ccc7

See more details on using hashes here.

File details

Details for the file getdaft-0.0.19-cp39-cp39-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.19-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 b7e6e457b7e0a7d612ab931c417043a268e8c4f7cba31566a019b97b391ce639
MD5 2caf8b57231768bd2b96da25f5ee9c49
BLAKE2b-256 63fcd305861cfb1b10cb86f89c87535d2737044fae7183808311ac58bf1df7b4

See more details on using hashes here.

File details

Details for the file getdaft-0.0.19-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.19-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 621cc059a45927f4d891cf10c5bfc6bc136f9133e335925c9c425e0841f7edcf
MD5 b5486a5318596865a7431c166ac7bd96
BLAKE2b-256 2beaf0c6688ef30941e409733e1c161b6d63c71a2f4da2876a2f99e6dc6d0806

See more details on using hashes here.

File details

Details for the file getdaft-0.0.19-cp38-cp38-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.19-cp38-cp38-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 79b2ee9ca66214e3250beccb0c741080d53c2354cfab4b55dca46260277e4284
MD5 e234976b03fc14c91883a6c9d2641abb
BLAKE2b-256 7760a8aeb8aa1c8f2f2c431d49ee6213254dc8642f7bd95bae5c2f8e0e9fffc4

See more details on using hashes here.

File details

Details for the file getdaft-0.0.19-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.19-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 69ac999e2e451a0ad07ef913e661ea2961df257afb366d9c5858e8b36eba4124
MD5 0fa5cccb56e5aa51c57a998d335c15be
BLAKE2b-256 e6b25b3d16e95e375ae4326dfa1f99c0ee93de6bfeb2908f8d2edb3afa2213d3

See more details on using hashes here.

File details

Details for the file getdaft-0.0.19-cp38-cp38-macosx_10_16_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.19-cp38-cp38-macosx_10_16_x86_64.whl
Algorithm Hash digest
SHA256 8a111ef491fd8f5dd2d7a5cc43951c93b57b9e3c4b6948217ba633a208af74a7
MD5 d6ee8d8d08c2eceb33594348dff36bdb
BLAKE2b-256 fe104fe5018a610a331d923c6066e7813a28dc784335587fb0928915f9895be3

See more details on using hashes here.

File details

Details for the file getdaft-0.0.19-cp37-cp37m-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.19-cp37-cp37m-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 d005f9e88fbfd571cdc1b25d043b82e6f7cdd906a042adcfb8f0eeb2954130da
MD5 a7201602f5acfa26ac9ca2bca3bf8f84
BLAKE2b-256 ea8a08010fb1f677c2f62b28c82338b3aa1bd5017be473b7d81617533142572a

See more details on using hashes here.

File details

Details for the file getdaft-0.0.19-cp37-cp37m-macosx_10_16_x86_64.whl.

File metadata

File hashes

Hashes for getdaft-0.0.19-cp37-cp37m-macosx_10_16_x86_64.whl
Algorithm Hash digest
SHA256 9fa6261dd721314755c3dc56fd8992ec7272ac02b9da490ce6dd6b4c17de6281
MD5 27a383feeda39c131d322e2a388ea015
BLAKE2b-256 5ab3e51babc86f3d7064a2bb02e4fa8880a51ba444ac2cf3e6557770fcef554b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page