Skip to main content

Dask + Delta Table

Project description

Dask Deltatable Reader

Reads a Delta Table from directory using Dask engine.

To Try out the package:

pip install dask-deltatable

Features:

  1. Reads the parquet files based on delta logs parallely using dask engine
  2. Supports all three filesystem like s3, azurefs, gcsfs
  3. Supports some delta features like
    • Time Travel
    • Schema evolution
    • parquet filters
      • row filter
      • partition filter
  4. Query Delta commit info - History
  5. vacuum the old/ unused parquet files
  6. load different versions of data using datetime.

Usage:

import dask_deltatable as ddt

# read delta table
ddt.read_delta_table("delta_path")

# read delta table for specific version
ddt.read_delta_table("delta_path",version=3)

# read delta table for specific datetime
ddt.read_delta_table("delta_path",datetime="2018-12-19T16:39:57-08:00")


# read delta complete history
ddt.read_delta_history("delta_path")

# read delta history upto given limit
ddt.read_delta_history("delta_path",limit=5)

# read delta history to delete the files
ddt.vacuum("delta_path",dry_run=False)

# Can read from S3,azure,gcfs etc.
ddt.read_delta_table("s3://bucket_name/delta_path",version=3)
# please ensure the credentials are properly configured as environment variable or
# configured as in ~/.aws/credential

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dask-deltatable-0.2.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

dask_deltatable-0.2-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file dask-deltatable-0.2.tar.gz.

File metadata

  • Download URL: dask-deltatable-0.2.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for dask-deltatable-0.2.tar.gz
Algorithm Hash digest
SHA256 9169660740ba89ca8753df7e55a9e75f8a18b1fa014f2b4d71f6e94a3b3e9255
MD5 d0ef7a60c9850ce5ffbccef52f9a7975
BLAKE2b-256 d09d6e634d77502d1ae920791083e064beb1da97d8b347a2f996aa0bbd1770df

See more details on using hashes here.

Provenance

File details

Details for the file dask_deltatable-0.2-py3-none-any.whl.

File metadata

  • Download URL: dask_deltatable-0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for dask_deltatable-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b5b56a57ec6623829cdd0fae8b744f1afc638e0bf6d1fc6d26945e92867a9f82
MD5 1db09bf405a4031e3c459eb65ad0c3b5
BLAKE2b-256 c2638d9505c174c00e8d2a831e8ee8494e8bafd31eb1732fa6655ca08b314b4b

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page