Dask + Deltalake
Project description
Dask Deltalake
Reads and write to deltalake from Dask leveraging delta-rs
Dask Deltalake Reader
Reads data from Deltalake with Dask
To Try out the package:
pip install dask_deltalake
Features:
- Reads the parquet files based on delta logs parallely using dask engine
- Supports all three filesystem like s3, azurefs, gcsfs
- Supports some delta features like
- Time Travel
- Schema evolution
- parquet filters
- row filter
- partition filter
- Query Delta commit info - History
- vacuum the old/ unused parquet files
- load different versions of data using datetime.
Usage:
import dask_deltalake as ddl
# read delta table
ddl.read_delta("delta_path")
# read delta table for specific version
ddl.read_delta("delta_path",version=3)
# read delta table for specific datetime
ddl.read_delta("delta_path",datetime="2018-12-19T16:39:57-08:00")
# read delta complete history
ddl.read_delta_history("delta_path")
# read delta history upto given limit
ddl.read_delta_history("delta_path",limit=5)
# read delta history to delete the files
ddl.vacuum("delta_path",dry_run=False)
# Can read from S3,azure,gcfs etc.
ddl.read_delta("s3://bucket_name/delta_path",version=3)
# please ensure the credentials are properly configured as environment variable or
# configured as in ~/.aws/credential
# can connect with AWS Glue catalog and read the complete delta table (currently only AWS catalog available)
# will take expilicit AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from environment
# variables if available otherwise fallback to ~/.aws/credential
ddl.read_delta(catalog=glue,database_name="science",table_name="physics")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dask_deltalake-0.0.1.tar.gz
(9.6 kB
view details)
Built Distribution
File details
Details for the file dask_deltalake-0.0.1.tar.gz
.
File metadata
- Download URL: dask_deltalake-0.0.1.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.10.8 Darwin/22.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 974a1007c29c5525a855175b240d2f43736beec5a540872effef4e2fc54d037b |
|
MD5 | cf2986fa1e1c530ccdcb0656a38fd45f |
|
BLAKE2b-256 | 110d1b3c587a2f6af29feeb1bedfee65b7657996f5880e526df374809b1f6f82 |
File details
Details for the file dask_deltalake-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: dask_deltalake-0.0.1-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.10.8 Darwin/22.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fdef3b67450035a1365bec5ad3649f353941bea92118398aa5b7e1293e06d17 |
|
MD5 | db0559a8732a58679ed14667e8f3c6b6 |
|
BLAKE2b-256 | cb06a585b7d1698db4171f9e97f6ddb0dc7dbc8eedf70b4ee69062897e459c04 |