Parse S3 logs to more easily calculate usage metrics per asset.
Project description
DANDI S3 Log Parser
Simple reductions of consolidated S3 logs (consolidation step not included in this repository) into minimal information for public sharing and plotting.
Developed for the DANDI Archive.
A single line of a raw S3 log file can be between 400-1000+ bytes. Some of the busiest daily logs on the archive can have around 5,014,386 lines. As of summer 2024, there are more than 6 TB of log files collected.
This parser can reduce these to tens of GB of consolidated and anonymized usage data, which is much more manageable for sharing and plotting.
Usage
To iteratively parse all historical logs all at once (parallelization with 10-15 total GB recommended):
parse_all_dandi_raw_s3_logs \
--base_raw_s3_log_folder_path < base log folder > \
--parsed_s3_log_folder_path < output folder > \
--excluded_log_files < any log files to skip> \
--excluded_ips < comma-separated list of known IPs to exclude > \
--maximum_number_of_workers < number of CPUs to use > \
--maximum_buffer_size_in_bytes < approximate amount of RAM to use >
For example, on Drogon:
parse_all_dandi_raw_s3_logs \
--base_raw_s3_log_folder_path /mnt/backup/dandi/dandiarchive-logs \
--parsed_s3_log_folder_path /mnt/backup/dandi/dandiarchive-logs-cody/parsed_7_13_2024/GET_per_asset_id \
--excluded_log_files /mnt/backup/dandi/dandiarchive-logs/stats/start-end.log \
--excluded_ips < Drogons IP > \
--maximum_number_of_workers 3 \
--maximum_buffer_size_in_bytes 15000000000
To parse only a single log file at a time, such as in a CRON job:
parse_dandi_raw_s3_log \
--raw_s3_log_file_path < s3 log file path > \
--parsed_s3_log_folder_path < output folder > \
--excluded_ips < comma-separated list of known IPs to exclude >
Submit line decoding errors
Please email line decoding errors collected from your local config file to the core maintainer before raising issues or submitting PRs contributing them as examples, to more easily correct any aspects that might require anonymization.
Developer notes
.log
file suffixes should typically be ignored when working with Git, so when committing changes to the example log collection, you will have to forcibly include it with
git add -f <example file name>.log
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dandi_s3_log_parser-0.1.0.tar.gz
.
File metadata
- Download URL: dandi_s3_log_parser-0.1.0.tar.gz
- Upload date:
- Size: 23.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a13c498451de7616fcfba8df4e301d1ac3ccbcffb2e00e25dfa0c5a1ec4750c5 |
|
MD5 | 9d33f81f4ac277a457111dfe929720fd |
|
BLAKE2b-256 | 5aabd4f3366f7294269a7639f84acebef0abc5c021a1541e18cd14887a2afc46 |
File details
Details for the file dandi_s3_log_parser-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: dandi_s3_log_parser-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 664b11207fca2af33f56842a88ed1914d1bd1c2deeb810eede55448ab55a741f |
|
MD5 | 0cd4e102025e603d36c74ceee83ca05e |
|
BLAKE2b-256 | fe7603d973878686cce649716b1298d133a3a56c1a94db84bc3181a827c59d86 |