Skip to main content

Delta Lake helper methods

Project description

Levi

Delta Lake helper methods. No Spark dependency.

Installation

Install the latest version with pip install levi.

Delta File Stats

The delta_file_stats function provides information on the number of bytes in files of a Delta table. Example usage:

import levi
from deltalake import DeltaTable

dt = DeltaTable("some_folder/some_table")
levi.delta_file_sizes(dt)

# return value
{
    'num_files_<1mb': 345, 
    'num_files_1mb-500mb': 588,
    'num_files_500mb-1gb': 960,
    'num_files_1gb-2gb': 0, 
    'num_files_>2gb': 5
}

This output shows that there are 345 small files with less than 1mb of data and 5 huge files with more than 2gb of data. It'd be a good idea to compact the small files and split up the large files to make queries on this Delta table run faster.

You can also specify the boundaries when you invoke the function to get a custom result:

levi.delta_file_sizes(dt, ["<1mb", "1mb-200mb", "200mb-800mb", "800mb-2gb", ">2gb"])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

levi-0.1.0.tar.gz (2.9 kB view details)

Uploaded Source

Built Distribution

levi-0.1.0-py3-none-any.whl (3.1 kB view details)

Uploaded Python 3

File details

Details for the file levi-0.1.0.tar.gz.

File metadata

  • Download URL: levi-0.1.0.tar.gz
  • Upload date:
  • Size: 2.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.9.5 Darwin/20.3.0

File hashes

Hashes for levi-0.1.0.tar.gz
Algorithm Hash digest
SHA256 eb4992202f0ca1d343e859ade686b99dba24a797b75f0c19772bc6859e0817f8
MD5 2f3a5fc4d471e8eca3a69e99847dd580
BLAKE2b-256 b3e8bcfef0a01d6e53b825fe34686c4f7f6e3a203ba2f57e6f5e4365baf692c3

See more details on using hashes here.

File details

Details for the file levi-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: levi-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.9.5 Darwin/20.3.0

File hashes

Hashes for levi-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 89833afe33066061fb29696c90acb1010131d52edafc817a09c59ff829a06fb8
MD5 9bdaee0d8c88535e3ac5309901b50c1d
BLAKE2b-256 e68e0fd8c338451276a409a9b0134ab6afbe6e9d36660c2b8b2c68d75477e309

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page