Skip to main content

helps you preserve an ethereum dataset

Project description

cryogen

helps you preserve the ethereum dataset fresh, fast and small

install

pip install cryogen

features

intelligently consolidates cryo-extracted datasets, cutting the number of chunks by 400-1000x.

offers fast in-place conversion that reduces disk footprint by 2x and increases query performance.

keeps the dataset fresh so you can always come back to up-to-date data.

usage

cryogen collect <dataset>

collect or update a cryo dataset.

cryogen uses 1000 block batches with zstd -3 compression. the gaps are filled automatically. the dataset can be up to 1000 blocks behind head beacuse the align option is used.

# you can also specify data_dir using CRYO_DATA_DIR env var
cryogen collect contracts --data-dir ~/cryo_data

# collect a block range, same format as cryo
cryogen collect traces --blocks 17000000:

cryogen consolidate <dataset>

consolidate a dataset in-place.

this command will merge parquets into larger files covering 1e4, 1e5, 1e6 blocks. smaller files are not touched until a larger contiguous block can be formed. the worst case for this algorithm is 17 + 9 + 9 + 9 = 44 files at block 17,999,000.

starting from v0.2.0 cryogen also reduces the number of row groups by approximately 100x. it creates a row group when an uncompressed size of a batch reaches 512 mb. this row group size provides optimal performance without imposing higher memory requirements that could lead to heavy swapping.

cryogen consolidate contracts

# test the feature without overwriting the dataset
cryogen consolidate contraces --no-inplace

note that after consolidating with cryogen, you should use cryogen to update the dataset. cryo won't recognize larger chunks and would attempt to collect the already merged and deleted smaller chunks, outputting duplicate data in the dataset.

cryogen watch <dataset>

combines collect and consolidate command.

keep it running and it will update the dataset periodically.

# refresh every 4 hours
cryogen watch contracts --interval 14400

cryogen info <dataset>

collects info about a parquet dataset.

cryogen info contracts
# {'num_rows': 62466632, 'files': 38, 'row_groups': 17984, 'total_compressed_size': 7850356027, 'total_uncompressed_size': 29236070746, 'elapsed': 0.747}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cryogen-0.2.0.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

cryogen-0.2.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file cryogen-0.2.0.tar.gz.

File metadata

  • Download URL: cryogen-0.2.0.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for cryogen-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8004fc9f81b46466cbc4e54e0bdfeca4196e95cd423fd4aa817feb2f3c7601f3
MD5 150a059fb0f161223c6d69e6409791e2
BLAKE2b-256 ae1e30e6167cd4996e920ca52daf2ac4c10fc8ce1f07916d639c17f43e230aed

See more details on using hashes here.

File details

Details for the file cryogen-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: cryogen-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for cryogen-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b2600d098e47995a2c93d7614dee1bb360d7b86e94613e0f6356852f15a3a82f
MD5 ee4dc7674eb2565e77502f2a92dbd26c
BLAKE2b-256 28d939ef6887ea190482d1ed5bf55f6bbe2d610269d2b94e9d2299700df7b7f6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page