Skip to main content

Delta Lake helper methods

Project description

Levi

Delta Lake helper methods. No Spark dependency.

Installation

Install the latest version with pip install levi.

Delta File Stats

The delta_file_stats function provides information on the number of bytes in files of a Delta table. Example usage:

import levi
from deltalake import DeltaTable

dt = DeltaTable("some_folder/some_table")
levi.delta_file_sizes(dt)

# return value
{
    'num_files_<1mb': 345, 
    'num_files_1mb-500mb': 588,
    'num_files_500mb-1gb': 960,
    'num_files_1gb-2gb': 0, 
    'num_files_>2gb': 5
}

This output shows that there are 345 small files with less than 1mb of data and 5 huge files with more than 2gb of data. It'd be a good idea to compact the small files and split up the large files to make queries on this Delta table run faster.

You can also specify the boundaries when you invoke the function to get a custom result:

levi.delta_file_sizes(dt, ["<1mb", "1mb-200mb", "200mb-800mb", "800mb-2gb", ">2gb"])

Skipped stats

Provides information on the number of files and number of bytes that are skipped for a given set of predicates.

import levi

dt = DeltaTable("some_folder/some_table")
levi.skipped_stats(dt, filters=[('a_float', '=', 4.5)])

# return value
{
    'num_files': 2,
    'num_files_skipped': 1,
    'num_bytes_skipped': 996
}

This predicate will skip one file and 996 bytes of data.

You can use skipped_stats to figure out the percentage of files that get skipped. You can also use this information to see if you should Z ORDER your data or otherwise rearrange it to allow for better file skipping.

Get Latest Delta Table Version

The latest_version function gets the most current Delta Table version number and returns it.

import levi
from deltalake import DeltaTable

dt = DeltaTable("some_folder/some_table")
levi.latest_version(dt)

# return value
2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

levi-0.2.0.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

levi-0.2.0-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file levi-0.2.0.tar.gz.

File metadata

  • Download URL: levi-0.2.0.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.9.5 Darwin/20.3.0

File hashes

Hashes for levi-0.2.0.tar.gz
Algorithm Hash digest
SHA256 01baddbca18a83bd4b5c73483518aebf7d5158a5b0cfa698d9d48c68d05cd65a
MD5 21f3669baf315d6580f715b1cc131037
BLAKE2b-256 e10c48df8faad808db8e701807b79de61bbcb131c12c94e292a245821329c1bb

See more details on using hashes here.

File details

Details for the file levi-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: levi-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.9.5 Darwin/20.3.0

File hashes

Hashes for levi-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bea0557342f93c217545d504ffdc519fa63ef0d86056951db3c1a3dfba746d09
MD5 31276f844cf5929852d129bf186a5d31
BLAKE2b-256 a53fcf49b9b564b7046b202cfb9e42b5c61b0fe76849309b6a6e9f58a2e5be84

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page