Skip to main content

Parse S3 logs to more easily calculate usage metrics per asset.

Project description

DANDI S3 Log Parser

codecov

TODO: update these on first release PyPI latest release version Ubuntu Supported Python versions License: BSD-3

Python code style: Black Python code style: Ruff

Simple reductions of consolidated S3 logs (consolidation step not included in this repository) into minimal information for public sharing and plotting.

Developed for the DANDI Archive.

Usage

To iteratively parse all historical logs all at once (parallelization with 10-15 total GB recommended):

parse_all_dandi_raw_s3_logs \
  --base_raw_s3_log_folder_path < base log folder > \
  --parsed_s3_log_folder_path < output folder > \
  --excluded_ips < comma-separated list of known IPs to exclude > \
  --maximum_number_of_workers < number of CPUs to use > \
  --maximum_buffer_size_in_bytes < approximate amount of RAM to use >

For example, on Drogon:

parse_all_dandi_raw_s3_logs \
  --base_raw_s3_log_folder_path /mnt/backup/dandi/dandiarchive-logs \
  --parsed_s3_log_folder_path /mnt/backup/dandi/dandiarchive-logs-cody/parsed_7_13_2024/GET_per_asset_id \
  --excluded_ips < Drogons IP > \
  --maximum_number_of_workers 3 \
  --maximum_buffer_size_in_bytes 15000000000

To parse only a single log file at a time, such as in a CRON job:

parse_dandi_raw_s3_log \
  --raw_s3_log_file_path < s3 log file path > \
  --parsed_s3_log_folder_path < output folder > \
  --excluded_ips < comma-separated list of known IPs to exclude >

Submit line decoding errors

Please email line decoding errors collected from your local config file to the core maintainer before raising issues or submitting PRs contributing them as examples, to more easily correct any aspects that might require anonymization.

Developer notes

.log file suffixes should typically be ignored when working with Git, so when committing changes to the example log collection, you will have to forcibly include it with

git add -f <example file name>.log

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dandi_s3_log_parser-0.0.1.tar.gz (23.7 kB view hashes)

Uploaded Source

Built Distribution

dandi_s3_log_parser-0.0.1-py3-none-any.whl (23.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page