Skip to main content

Dask + Deltalake

Project description

Dask Deltalake

Reads and write to deltalake from Dask leveraging delta-rs

Dask Deltalake Reader

Reads data from Deltalake with Dask

To Try out the package:

pip install dask_deltalake

Features:

  1. Reads the parquet files based on delta logs parallely using dask engine
  2. Supports all three filesystem like s3, azurefs, gcsfs
  3. Supports some delta features like
    • Time Travel
    • Schema evolution
    • parquet filters
      • row filter
      • partition filter
  4. Query Delta commit info - History
  5. vacuum the old/ unused parquet files
  6. load different versions of data using datetime.

Usage:

import dask_deltalake as ddl

# read delta table
ddl.read_delta("delta_path")

# read delta table for specific version
ddl.read_delta("delta_path",version=3)

# read delta table for specific datetime
ddl.read_delta("delta_path",datetime="2018-12-19T16:39:57-08:00")


# read delta complete history
ddl.read_delta_history("delta_path")

# read delta history upto given limit
ddl.read_delta_history("delta_path",limit=5)

# read delta history to delete the files
ddl.vacuum("delta_path",dry_run=False)

# Can read from S3,azure,gcfs etc.
ddl.read_delta("s3://bucket_name/delta_path",version=3)
# please ensure the credentials are properly configured as environment variable or
# configured as in ~/.aws/credential

# can connect with AWS Glue catalog and read the complete delta table (currently only AWS catalog available)
# will take expilicit AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from environment
# variables if available otherwise fallback to ~/.aws/credential
ddl.read_delta(catalog=glue,database_name="science",table_name="physics")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dask_deltalake-0.0.1.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

dask_deltalake-0.0.1-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file dask_deltalake-0.0.1.tar.gz.

File metadata

  • Download URL: dask_deltalake-0.0.1.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.8 Darwin/22.2.0

File hashes

Hashes for dask_deltalake-0.0.1.tar.gz
Algorithm Hash digest
SHA256 974a1007c29c5525a855175b240d2f43736beec5a540872effef4e2fc54d037b
MD5 cf2986fa1e1c530ccdcb0656a38fd45f
BLAKE2b-256 110d1b3c587a2f6af29feeb1bedfee65b7657996f5880e526df374809b1f6f82

See more details on using hashes here.

File details

Details for the file dask_deltalake-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: dask_deltalake-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.8 Darwin/22.2.0

File hashes

Hashes for dask_deltalake-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1fdef3b67450035a1365bec5ad3649f353941bea92118398aa5b7e1293e06d17
MD5 db0559a8732a58679ed14667e8f3c6b6
BLAKE2b-256 cb06a585b7d1698db4171f9e97f6ddb0dc7dbc8eedf70b4ee69062897e459c04

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page