Skip to main content

Copies files retrieved from an API to a S3 bucket or a local machine.

Project description

websnap

PyPI - Version PyPI - Downloads PyPI - License

Copies files retrieved from an API to a S3 bucket or a local machine.


Installation

pip install websnap

Quickstart

Websnap can be used as a function or as a CLI.

Click here to view a websnap overview diagram.

Function

from websnap import websnap

# Execute websnap using default arguments
websnap()

# Execute websnap passing arguments
websnap(file_logs=True, s3_uploader=True, backup_s3_count=7, early_exit=True)

CLI

To access CLI documentation in terminal execute:

websnap_cli --help

Function Parameters / CLI Options

Function Parameters

Parameter Type Default
config str "config.ini"
log_level str "INFO"
file_logs bool False
s3_uploader bool False
backup_s3_count int | None None
timeout int 32
early_exit bool False
repeat_minutes int | None None
section_config str | None None

CLI Options

Option Shortcut Default
--config -c config.ini
--log_level -l INFO
--file_logs -f False
--s3_uploader -s False
--backup_s3_count -b None
--timeout -t 32
--early_exit -e False
--repeat_minutes -r None
--section_config -n None

Description

Function parameter /
CLI option
Description
config (str)
  • Path to configuration .ini file
  • Default value expects file called config.ini in same directory as websnap package is being executed from
log_level (str)
file_logs (bool)
  • Enable rotating file logs
s3_uploader (bool)
  • Enable uploading of files to S3 bucket
    backup_s3_count (int | None)
    • Copy and backup file in each config section to the configured S3 bucket backup_s3_count times
    • Remove file with the oldest last modified timestamp
    • If omitted then files are not copied or removed
    • If enabled then backup files are copied and assigned the original file's name with the last modified timestamp appended
    timeout (int)
    • Number of seconds to wait for response for each HTTP request before timing out
    • Default value is 32 seconds
    early_exit (bool)
    • Enable early program termination after error occurs
    • If omitted logs errors but continues program execution
    repeat_minutes (int | None)
    • Run websnap continuously every repeat_minutes minutes
    • If omitted then websnap does not repeat
    section_config (str | None)
    • File or URL to obtain additional configuration sections
    • If omitted then default value is None and only config specified in config argument is used
    • Cannot be used to assign "DEFAULT" values in config
    • Currently only supports JSON config and can only be used if config argument is also a JSON file
    • Duplicate sections will overwrite values with the same section passed in the config argument

    Usage: S3 Bucket

    Copy files retrieved from an API to a S3 bucket.

    Uses the AWS SDK for Python (Boto3) to add and backup API files to a S3 bucket.

    Examples

    Function

    # The s3_uploader argument must be passed as True to copy files to a S3 bucket
    # Copies files to a S3 bucket using default argument values
    websnap(s3_uploader=True)
    
    # Copies files to a S3 bucket and repeat every 1440 minutes (24 hours), 
    # file logs are enabled and only 3 backup files are allowed for each config section
    websnap(file_logs=True, s3_uploader=True, backup_s3_count=3, repeat_minutes=1440)
    

    CLI

    • The following CLI option must be used to enable websnap to upload files to a S3 bucket: --s3_uploader

    • Copies files to a S3 bucket using default argument values:

       websnap_cli --s3_uploader 
      
    • Copies files to a S3 bucket and repeat every 1440 minutes (24 hours), file logs are enabled and only 3 backup files are allowed for each config section:

       websnap_cli --file_logs --s3_uploader --backup_s3_count 3 --repeat_minutes 1440
      

    Configuration

    • A valid .ini or .json configuration file is required for both function and CLI usage.
    • Websnap expects the config to be config.ini in the same directory as websnap package is being executed from.
      • However, this can be changed using the config function argument (or CLI --config option).
    • All keys in tables below are mandatory.

    S3 Configuration Example Files

    Format Example Configuration File
    .ini src/websnap/config_templates/s3_config_template.ini
    .json src/websnap/config_templates/s3_config_template.json

    Default Configuration

    Example default S3 configuration:

    [DEFAULT]
    endpoint_url=https://dreamycloud.com
    aws_access_key_id=1234567abcdefg
    aws_secret_access_key=hijklmn1234567
    
    Key Value Description
    endpoint_url URL to use for the constructed S3 client
    aws_secret_key_id AWS access key ID
    aws_secret_access_key AWS secret access key

    Other Sections (one per API URL endpoint)

    • Each file retrieved from an API requires its own config section!
    • The section name be anything, it is suggested to have a name that relates to the copied file.

    Example S3 config section configuration with key prefix:

    [resource]
    url=https://www.example.com/api/resource
    bucket=exampledata
    key=subdirectory_resource/resource.xml
    

    Example S3 config section configuration without key prefix:

    [project]
    url=https://www.example.com/api/project
    bucket=exampledata
    key=project.json
    
    Key Value Description
    url API URL endpoint that file will be retrieved from
    bucket Bucket that file will be written in
    key File name with extension, can optionally include prefix

    Usage: Local Machine

    Copy files retrieved from an API to a local machine.

    Examples

    Function

    # Write files retrieved from an API to local machine using default argument values
    websnap()
    
    # Write files retrieved from an API locally and repeats every 60 minutes (1 hour), 
    # file logs are enabled
    websnap(file_logs=True, repeat_minutes=60)
    

    CLI

    • Write copied files to local machine using default argument values:

       websnap_cli 
      
    • Write copied files locally and repeats every 60 minutes (1 hour), file logs are enabled:

       websnap_cli --file_logs --repeat_minutes 60
      

    Configuration

    • A valid .ini or .json configuration file is required for both function and CLI usage.
    • Websnap expects the config to be config.ini in the same directory as websnap package is being executed from.
      • However, this can be changed using the config function argument (or CLI --config option).
    • Each file that will be retrieved from an API requires its own section.
    • If the optional directory key/value pair is omitted then the file will be written in the directory that the program is executed from.

    Configuration Example Files

    Format Example Configuration File
    .ini src/websnap/config_templates/config_template.ini
    .json src/websnap/config_templates/config_template.json

    Sections (one per API URL endpoint)

    Example local machine configuration section:

    [project]
    url=https://www.example.com/api/project
    file_name=project.json
    directory=projectdata
    
    Key Value Description
    url API URL endpoint that file will be retrieved from
    file_name File name with extension
    directory (optional) Local directory name that file will be written in

    Logs

    Websnap supports optional rotating file logs.

    • The following CLI option must be used to enable websnap to support rotating file logs: --file_logs
      • In function usage the following argument must be passed to support rotating file logs: file_logs=True
    • If log keys are not specified in the configuration [DEFAULT] section then default values in the table below will be used.
    • log_when expects a value used by logging module TimedRotatingFileHandler.
    • Click here for more information about how to use TimedRotatingFileHandler.
    • The default values result in the file logs being rotated once every day and no removal of backup log files.

    Configuration

    Example log configuration:

    [DEFAULT]
    log_when=midnight
    log_interval=1
    log_backup_count=7
    

    [DEFAULT] Section

    Key Default Value Description
    log_when D Specifies type of interval
    log_interval 1 Duration of interval (must be positive integer)
    log_backup_count 0 If nonzero then at most <log_backup_count> files will be kept,
    oldest log file is deleted (must be non-negative integer)

    Minimum Download Size

    Websnap supports optionally specifying the minimum download size (in kilobytes) a file must be to copy it from the configured API URL endpoint.

    • By default the minimum default minimum size is 0 kb.
      • Unless specified in the configuration this means that a file of any size can be downloaded by websnap.
    • Configured minimum download size must be a non-negative integer.
    • If the content from the API URL endpoint is less than the configured size:
      • An error will be logged and the program continues to the next config section.
      • If the CLI option --early_exit (or function argument early_exit=True) is enabled then the program will terminate early.

    Configuration

    Example minimum download size configuration:

    [DEFAULT]
    min_size_kb=1
    

    [DEFAULT] Section

    Key Default Value Description
    min_size_kb 0 Minimum download size in kilobytes (must be non-negative integer)

    Author

    Rebecca Kurup Buchholz

    Purpose

    This project was developed to facilitate EnviDat resiliency and support continuous operation during server maintenance.

    EnviDat is the environmental data portal of the Swiss Federal Institute for Forest, Snow and Landscape Research WSL.

    License

    MIT License

    Project details


    Download files

    Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

    Source Distribution

    websnap-1.3.1.tar.gz (17.9 kB view details)

    Uploaded Source

    Built Distribution

    websnap-1.3.1-py3-none-any.whl (19.3 kB view details)

    Uploaded Python 3

    File details

    Details for the file websnap-1.3.1.tar.gz.

    File metadata

    • Download URL: websnap-1.3.1.tar.gz
    • Upload date:
    • Size: 17.9 kB
    • Tags: Source
    • Uploaded using Trusted Publishing? No
    • Uploaded via: pdm/2.18.1 CPython/3.11.9 Linux/5.4.0-193-generic

    File hashes

    Hashes for websnap-1.3.1.tar.gz
    Algorithm Hash digest
    SHA256 6fbff22fb5a04d9d6a784de8d0923ffea33dbaf963a49955334e8e3189f38783
    MD5 55b9e93c6026ca4d6a271ccf3cd0226c
    BLAKE2b-256 92eeb65858a4cbdea65f314a5a22bb67f2b9c14d43f3c1a1002f498a7cb5e977

    See more details on using hashes here.

    File details

    Details for the file websnap-1.3.1-py3-none-any.whl.

    File metadata

    • Download URL: websnap-1.3.1-py3-none-any.whl
    • Upload date:
    • Size: 19.3 kB
    • Tags: Python 3
    • Uploaded using Trusted Publishing? No
    • Uploaded via: pdm/2.18.1 CPython/3.11.9 Linux/5.4.0-193-generic

    File hashes

    Hashes for websnap-1.3.1-py3-none-any.whl
    Algorithm Hash digest
    SHA256 09b49c840e6def627287b7905c3e34ba57ecec3f5746ad7656c268fb17fb5798
    MD5 dfb326d87dc755907ef46b86d66707ed
    BLAKE2b-256 8fb1eb8af197054e4bcf32c8a9edb906fb571d8cadc2d0ae46ec309d7a237903

    See more details on using hashes here.

    Supported by

    AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page