Skip to main content

Copies API JSON files to a S3 bucket or a local machine.

Project description

websnap

Copies API JSON files to a S3 bucket or a local machine.

Purpose

This project was developed to facilitate EnviDat resiliency and support continuous operation during server maintenance.

EnviDat is the environmental data portal of the Swiss Federal Institute for Forest, Snow and Landscape Research WSL.

Installation

pip install websnap

Quickstart

Websnap can be used as a function or as a CLI.

Click here to view a websnap overview diagram.

Function

import websnap

# Execute websnap using default arguments
websnap.websnap()

# Execute websnap passing arguments
websnap.websnap(file_logs=True, s3_uploader=True, backup_s3_count=7, early_exit=True)

CLI

To access CLI documentation in terminal execute:

websnap_cli --help

Function Parameters / CLI Options

Function Parameters

Parameter Type Default
config str "config.ini"
log_level str "INFO"
file_logs bool False
s3_uploader bool False
backup_s3_count int | None None
timeout int 32
early_exit bool False
repeat_minutes int | None None

CLI Options

Option Shortcut Default
--config -c config.ini
--log_level -l INFO
--file_logs -f False
--s3_uploader -s False
--backup_s3_count -b None
--timeout -t 32
--early_exit -e False
--repeat_minutes -r None

Description

Function parameter /
CLI option
Description
config Path to configuration .ini file.
Default value expects file called config.ini in same directory as websnap package is being executed from.
log_level Level to use for logging. Default value is INFO.
Valid logging levels are DEBUG, INFO, WARNING, ERROR, or CRITICAL.
Click here to learn more about logging levels.
file_logs Enable rotating file logs.
s3_uploader Enable uploading of files to S3 bucket.
backup_s3_count Copy and backup S3 objects in each config section <backup_s3_count> times, remove object with the oldest last modified timestamp.
If omitted then objects are not copied or removed.
If enabled then backup objects are copied and assigned the original object's name with the last modified timestamp appended.
timeout Number of seconds to wait for response for each HTTP request before timing out.
Default value is 32 seconds.
early_exit Enable early program termination after error occurs.
If omitted logs errors but continues program execution.
repeat_minutes Run websnap continuously every <repeat_minutes> minutes.
If omitted then websnap does not repeat.

Usage: S3 Bucket

Copies API JSON files to a S3 bucket.

Examples

Function

# The s3_uploader argument must be passed as True to upload files to a S3 bucket
# Uploads files to a S3 bucket using default argument values
websnap.websnap(s3_uploader=True)

# Uploads files to a S3 bucket and repeat every 1440 minutes (24 hours), 
# file logs are enabled and only 3 backup objects are allowed for each config section
websnap.websnap(file_logs=True, s3_uploader=True, backup_s3_count=3, repeat_minutes=1440)

CLI

  • The following CLI option must be used to enable websnap to upload files to a S3 bucket: --s3_uploader

  • Uploads files to a S3 bucket using default argument values:

     websnap_cli --s3_uploader 
    
  • Uploads files to a S3 bucket and repeat every 1440 minutes (24 hours), file logs are enabled and only 3 backup objects are allowed for each config section:

     websnap_cli --file_logs --s3_uploader --backup_s3_count 3 --repeat_minutes 1440
    

Configuration

  • A valid .ini configuration file is required for both function and CLI usage.
  • Websnap expects the config to be config.ini in the same directory as websnap package is being executed from.
    • However, this can be changed using the config function argument (or CLI --config option).
  • S3 config example file: src/websnap/config_templates/s3_config_template.ini
  • All keys in tables below are mandatory.

[DEFAULT] Section

Example S3 configuration [DEFAULT] section:

[DEFAULT]
endpoint_url=https://dreamycloud.com
aws_access_key_id=1234567abcdefg
aws_secret_access_key=hijklmn1234567
Key Value Description
endpoint_url URL to use for the constructed S3 client
aws_secret_key_id AWS access key ID
aws_secret_access_key AWS secret access key

Other Sections (one per API URL endpoint)

  • Each API JSON file that will be downloaded requires its own config section!
  • The section name be anything, it is suggested to have a name that relates to the downloaded file.

Example S3 config section configuration with key prefix:

[resource]
url=https://www.example.com/api/resource
bucket=exampledata
key=subdirectory_resource/resource.json

Example S3 config section configuration without key prefix:

[project]
url=https://www.example.com/api/project
bucket=exampledata
key=project.json
Key Value Description
url API URL endpoint that JSON file will be downloaded from
bucket Bucket that JSON file will be written in
key File name with extension, can optionally include prefix

Usage: Local Machine

Copies API JSON files to a local machine.

Examples

Function

# Write downloaded files to local machine using default argument values
websnap.websnap()

# Write downloaded files locally and repeats every 60 minutes (1 hour), file logs are enabled
websnap.websnap(file_logs=True, repeat_minutes=60)

CLI

  • Write downloaded files to local machine using default argument values:

     websnap_cli 
    
  • Write downloaded files locally and repeats every 60 minutes (1 hour), file logs are enabled:

     websnap_cli --file_logs --repeat_minutes 60
    

Configuration

  • A valid .ini configuration file is required for both function and CLI usage.
  • Websnap expects the config to be config.ini in the same directory as websnap package is being executed from.
    • However, this can be changed using the config function argument (or CLI --config option).
  • Local machine config example file: src/websnap/config_templates/config_template.ini
  • Each API URL JSON file that will be downloaded requires its own section.
  • If the optional directory key/value pair is omitted then the file will be written in the directory that the program is executed from.

Example local machine configuration section:

[project]
url=https://www.example.com/api/project
file_name=project.json
directory=projectdata

Sections (one per API URL endpoint)

Key Value Description
url API URL endpoint that JSON file will be downloaded from
file_name File name with extension
directory (optional) Directory name that JSON file will be written in

Logs

Websnap supports optional rotating file logs.

  • The following CLI option must be used to enable websnap to support rotating file logs: --file_logs
    • In function usage the following argument must be passed to support rotating file logs: file_logs=True
  • If log keys are not specified in the configuration [DEFAULT] section then default values in the table below will be used.
  • log_when expects a value used by logging module TimedRotatingFileHandler.
  • Click here for more information about how to use TimedRotatingFileHandler.
  • The default values result in the file logs being rotated once every day and no removal of backup log files.

Configuration

Example log configuration:

[DEFAULT]
log_when=midnight
log_interval=1
log_backup_count=7

[DEFAULT] Section

Key Default Value Description
log_when D Specifies type of interval
log_interval 1 Duration of interval (must be positive integer)
log_backup_count 0 If nonzero then at most <log_backup_count> files will be kept, oldest log file is deleted (must be non-negative integer)

Minimum Download Size

Websnap supports optionally specifying the minimum download size (in kilobytes) a JSON file must be to download it from the configured API URL endpoint.

  • By default the minimum default minimum size is 0 kb.
    • Unless specified in the configuration this means that a file of any size can be downloaded by websnap.
  • Configured minimum download size must be a non-negative integer.
  • If the content from the API URL endpoint is less than the configured size:
    • An error will be logged and the program continues to the next config section.
    • If the CLI option --early_exit (or function argument early_exit=True) is enabled then the program will terminate early.

Configuration

Example minimum download size configuration:

[DEFAULT]
min_size_kb=1

[DEFAULT] Section

Key Default Value Description
min_size_kb 0 Minimum download size in kilobytes (must be non-negative integer)

Author

Rebecca Kurup Buchholz

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

websnap-1.2.1.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

websnap-1.2.1-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file websnap-1.2.1.tar.gz.

File metadata

  • Download URL: websnap-1.2.1.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.17.3 CPython/3.11.9 Linux/5.4.0-192-generic

File hashes

Hashes for websnap-1.2.1.tar.gz
Algorithm Hash digest
SHA256 1700e23ef46879d6f891a1b8723dc6020fb344c936e4a59b948e9ef5b2290abf
MD5 2f2d98fdc8cbac0cbde10ea02eddbeb9
BLAKE2b-256 71c7fd15263e1d29762650f8642425f1170b5bf06a48bc5c89f9b83e1ca830c5

See more details on using hashes here.

File details

Details for the file websnap-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: websnap-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.17.3 CPython/3.11.9 Linux/5.4.0-192-generic

File hashes

Hashes for websnap-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9880ed544306505874a28fb2c989ff6201fe27fce05efc23932a38784c0ef3b5
MD5 ce1e0b82623294744119a6a3e388e920
BLAKE2b-256 ccfbd3a52e60828716c32a2887ec2ada739ca59fc160d7a5515ea80091b873ee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page