Skip to main content

Parse S3 logs to more easily calculate usage metrics per asset.

Project description

DANDI S3 Log Parser

Ubuntu Supported Python versions codecov

PyPI latest release version License: BSD-3

Python code style: Black Python code style: Ruff

Extraction of minimal information from consolidated raw S3 logs for public sharing and plotting.

Developed for the DANDI Archive.

A few summary facts as of 2024:

  • A single line of a raw S3 log file can be between 400-1000+ bytes.
  • Some of the busiest daily logs on the archive can have around 5,014,386 lines.
  • There are more than 6 TB of log files collected in total.
  • This parser reduces that total to around 20 GB of essential information.

The reduced information is then additionally mapped to currently available assets in persistent published Dandiset versions and current drafts, which only comprise around 100 MB of the original data.

These small Dandiset-specific summaries are soon to be shared publicly.

Installation

pip install dandi_s3_log_parser

Usage

Reduce entire history

To iteratively parse all historical logs all at once (parallelization strongly recommended):

reduce_all_dandi_raw_s3_logs \
  --base_raw_s3_logs_folder_path < base log folder > \
  --reduced_s3_logs_folder_path < output folder > \
  --maximum_number_of_workers < number of CPUs to use > \
  --maximum_buffer_size_in_mb < approximate amount of RAM to use > \
    --excluded_ips < comma-separated list of known IPs to exclude >

For example, on Drogon:

reduce_all_dandi_raw_s3_logs \
  --base_raw_s3_logs_folder_path /mnt/backup/dandi/dandiarchive-logs \
  --reduced_s3_logs_folder_path /mnt/backup/dandi/dandiarchive-logs-cody/parsed_8_15_2024/REST_GET_OBJECT_per_asset_id \
  --maximum_number_of_workers 6 \
  --maximum_buffer_size_in_mb 5000 \
  --excluded_ips < Drogons IP >

Reduce a single log file

To parse only a single log file at a time, such as in a CRON job:

reduce_dandi_raw_s3_log \
  --raw_s3_log_file_path < s3 log file path > \
  --reduced_s3_logs_folder_path < output folder > \
  --excluded_ips < comma-separated list of known IPs to exclude >

For example, on Drogon:

reduce_dandi_raw_s3_log \
  --raw_s3_log_file_path /mnt/backup/dandi/dandiarchive-logs/2024/08/17.log \
  --reduced_s3_logs_folder_path /mnt/backup/dandi/dandiarchive-logs-cody/parsed_8_15_2024/REST_GET_OBJECT_per_asset_id \
  --excluded_ips < Drogons IP >

Map to Dandisets

The next step, that should also be updated regularly (daily-weekly), is to iterate through all current versions of all Dandisets, mapping the reduced logs to their assets.

map_reduced_logs_to_dandisets \
  --reduced_s3_logs_folder_path < reduced s3 logs folder path > \
  --dandiset_logs_folder_path < mapped logs folder >

For example, on Drogon:

map_reduced_logs_to_dandisets \
  --reduced_s3_logs_folder_path /mnt/backup/dandi/dandiarchive-logs-cody/parsed_8_15_2024/REST_GET_OBJECT_per_asset_id \
  --dandiset_logs_folder_path /mnt/backup/dandi/dandiarchive-logs-cody/mapped_logs_8_15_2024

Submit line decoding errors

Please email line decoding errors collected from your local config file to the core maintainer before raising issues or submitting PRs contributing them as examples, to more easily correct any aspects that might require anonymization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dandi_s3_log_parser-0.2.0.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

dandi_s3_log_parser-0.2.0-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file dandi_s3_log_parser-0.2.0.tar.gz.

File metadata

  • Download URL: dandi_s3_log_parser-0.2.0.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for dandi_s3_log_parser-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0eabd343e1d9eabfed09a9a47fb0f1fb3a8b7e07319ae487353e9bbd09b094c1
MD5 1c80e91223d5c631d73b2384918fb047
BLAKE2b-256 517059b46e2cf327eba86a3f0145ddc62bb45de97420d7882369178839f5b02c

See more details on using hashes here.

File details

Details for the file dandi_s3_log_parser-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dandi_s3_log_parser-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 79eeb76bfa0c5b50f64fea76e1e5fb9600a20a220c331f36b451ed0602171e12
MD5 32647d4bf6baf74dd208ef2c1549f5bd
BLAKE2b-256 3e44a8c6487b70bbb8740ef4d9c30dc3577e627316c3a2baa58e3c736d4b498c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page