Copies files from URLs and uploads them to a S3 bucket or a local machine.
Project description
websnap
Copies files from URLs and uploads them to a S3 bucket.
Also supports writing files downloaded from URLs to a local machine.
Documentation Topics
Purpose
This project was developed to facilitate EnviDat resiliency and support continuous operation during server maintenance.
EnviDat is the environmental data portal of the Swiss Federal Institute for Forest, Snow and Landscape Research WSL.
Installation
pip install websnap
Quickstart
Websnap can be used as a function or as a CLI
Click here to view a websnap overview diagram
Function
import websnap
# Execute websnap using default arguments
websnap.websnap()
# Execute websnap passing arguments
websnap.websnap(file_logs=True, s3_uploader=True, backup_s3_count=7, early_exit=True)
CLI
To access CLI documentation in terminal execute:
websnap_cli --help
Function Parameters / CLI Options
Function Parameters
Parameter | Type | Default |
---|---|---|
config |
str |
"config.ini" |
log_level |
str |
"INFO" |
file_logs |
bool |
False |
s3_uploader |
bool |
False |
backup_s3_count |
int | None |
None |
early_exit |
bool |
False |
repeat_minutes |
int | None |
None |
CLI Options
Option | Shortcut | Default |
---|---|---|
--config |
-c |
config.ini |
--log_level |
-l |
INFO |
--file_logs |
-f |
False |
--s3_uploader |
-s |
False |
--backup_s3_count |
-b |
None |
--early_exit |
-e |
False |
--repeat_minutes |
-r |
None |
Description
Function parameter / CLI option | Description |
---|---|
config |
Path to configuration .ini file.Default value expects file called config.ini in same directory as websnap package is being executed from. |
log_level |
Level to use for logging. Default value is INFO .Valid logging levels are DEBUG , INFO , WARNING , ERROR , or CRITICAL .Click here to learn more about logging levels. |
file_logs |
Enable rotating file logs. |
s3_uploader |
Enable uploading of files to S3 bucket. |
backup_s3_count |
Copy and backup S3 objects in each config section <backup_s3_count> times, remove object with the oldest last modified timestamp. If omitted then objects are not copied or removed. If enabled then backup objects are copied and assigned the original object's name with the last modified timestamp appended. |
early_exit |
Enable early program termination after error occurs. If omitted logs URL processing errors but continues program execution. |
repeat_minutes |
Run websnap continuously every <repeat_minutes> minutes. If omitted then websnap does not repeat. |
Usage: S3 Bucket
Copy files from URLs and upload them to a S3 bucket.
Examples
Function
# The s3_uploader argument must be passed as True to upload files to a S3 bucket
# Uploads files to a S3 bucket using default argument values
websnap.websnap(s3_uploader=True)
# Uploads files to a S3 bucket and repeat every 1440 minutes (24 hours),
# file logs are enabled and only 3 backup objects are allowed for each config section
websnap.websnap(file_logs=True, s3_uploader=True, backup_s3_count=3, repeat_minutes=1440)
CLI
-
The following CLI option must be used to enable websnap to upload files to a S3 bucket:
--s3_uploader
-
Uploads files to a S3 bucket using default argument values:
websnap_cli --s3_uploader
-
Uploads files to a S3 bucket and repeat every 1440 minutes (24 hours), file logs are enabled and only 3 backup objects are allowed for each config section:
websnap_cli --file_logs --s3_uploader --backup_s3_count 3 --repeat_minutes 1440
Configuration
- A valid
.ini
configuration file is required for both function and CLI usage. - Websnap expects the config to be
config.ini
in the same directory as websnap package is being executed from.- However, this can be changed using the
config
function argument (or CLI--config
option).
- However, this can be changed using the
- S3 config example file: src/websnap/config_templates/s3_config_template.ini
- All keys in tables below are mandatory.
[DEFAULT]
Section
Example S3 configuration [DEFAULT]
section:
[DEFAULT]
endpoint_url=https://dreamycloud.com
aws_access_key_id=1234567abcdefg
aws_secret_access_key=hijklmn1234567
Key | Value Description |
---|---|
endpoint_url |
The URL to use for the constructed S3 client |
aws_secret_key_id |
AWS access key ID |
aws_secret_access_key |
AWS secret access key |
Other Sections (one per URL)
- Each URL file that will be downloaded requires its own config section!
- The section name be anything, it is suggested to have a name that relates to the downloaded file.
Example S3 config section configuration with key prefix:
[resource]
url=https://www.example.com/api/resource
bucket=exampledata
key=subdirectory_resource/resource.json
Example S3 config section configuration without key prefix:
[project]
url=https://www.example.com/api/project
bucket=exampledata
key=project.json
Key | Value Description |
---|---|
url |
URL that file will be downloaded from |
bucket |
Bucket that file will be written in |
key |
File name with extension, can optionally include prefix |
Usage: Local Machine
Download files from URLs and write files to local machine.
Examples
Function
# Write downloaded files to local machine using default argument values
websnap.websnap()
# Write downloaded files locally and repeats every 60 minutes (1 hour), file logs are enabled
websnap.websnap(file_logs=True, repeat_minutes=60)
CLI
-
Write downloaded files to local machine using default argument values:
websnap_cli
-
Write downloaded files locally and repeats every 60 minutes (1 hour), file logs are enabled:
websnap_cli --file_logs --repeat_minutes 60
Configuration
- A valid
.ini
configuration file is required for both function and CLI usage. - Websnap expects the config to be
config.ini
in the same directory as websnap package is being executed from.- However, this can be changed using the
config
function argument (or CLI--config
option).
- However, this can be changed using the
- Local machine config example file: src/websnap/config_templates/config_template.ini
- Each URL file that will be downloaded requires its own section.
- If the optional
directory
key/value pair is omitted then the file will be written in the directory that the program is executed from.
Example local machine configuration section:
[project]
url=https://www.example.com/api/project
file_name=project.json
directory=projectdata
Sections (one per URL)
Key | Value Description |
---|---|
url |
URL that file will be downloaded from |
file_name |
File name with extension |
directory (optional) |
Directory name that file will be written in |
Log Support
Websnap supports optional rotating file logs.
- The following CLI option must be used to enable websnap to support rotating file logs:
--file_logs
- In function usage the following argument must be passed to support rotating file
logs:
file_logs=True
- In function usage the following argument must be passed to support rotating file
logs:
- If log keys are not specified in the configuration
[DEFAULT]
section then default values in the table below will be used. log_when
expects a value used by logging module TimedRotatingFileHandler.- For more details about how to use TimedRotatingFileHandler please click here
- The default values result in the file logs being rotated once every day and no removal of backup log files.
Configuration
Example log configuration:
[DEFAULT]
log_when=midnight
log_interval=1
log_backup_count=7
[DEFAULT]
Section
Key | Default | Value Description |
---|---|---|
log_when |
D |
Specifies type of interval |
log_interval |
1 |
Duration of interval (must be positive integer) |
log_backup_count |
0 |
If nonzero then at most <log_backup_count > files will be kept, oldest log file is deleted (must be non-negative integer) |
Minimum Download Size
Websnap supports optionally specifying the minimum download size (in kilobytes) a file must be to download it from the configured URL.
- By default the minimum default minimum size is 0 kb.
- Unless specified in the configuration this means that a file of any size can be downloaded by websnap.
- Configured minimum download size must be a non-negative integer.
- If the content from the URL is less than the configured size:
- An error will be logged and the program continues to the next config section.
- If the CLI option
--early_exit
(or function argumentearly_exit=True)
is enabled then the program will terminate early.
Configuration
Example minimum download size configuration:
[DEFAULT]
min_size_kb=1
[DEFAULT]
Section
Key | Default | Value Description |
---|---|---|
min_size_kb |
0 |
Minimum download size in kilobytes (must be non-negative integer) |
Scheduled Pipelines Automation
A CI/CD pipeline is currently used to automate execution of websnap using a GitLab pipeline schedule.
Pipeline script specifications:
- For details see
.gitlab-ci.yml
- Uploads objects to a S3 bucket
- Backs up S3 objects
- Early exit is enabled, this causes pipeline failure if an error occurs
Pipeline required CI/CD variables:
- CONFIG_INI - text with required S3 config values, for example see
src/websnap/config_templates/s3_config_template.ini
- BACKUP_S3_COUNT - number of S3 objects to back up for each configured URL
Pre-commit Hooks
Pre-commit hooks ensure that the application uses stylistic conventions before code changes can be commited.
Pre-commit hooks are specified in .pre-commit-config.yaml
.
To install pre-commit hooks for use during development execute:
pdm run pre-commit install
To run pre-commit hooks manually on all files execute:
pre-commit run --all-files
Author
Rebecca Kurup Buchholz, Swiss Federal Institute for Forest, Snow and Landscape Research WSL
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file websnap-1.1.4.tar.gz
.
File metadata
- Download URL: websnap-1.1.4.tar.gz
- Upload date:
- Size: 15.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.17.3 CPython/3.11.9 Linux/5.4.0-190-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf277b4da162022369394f55f0eedff6861cc37136f0c6a04f049203d9bee571 |
|
MD5 | 3f41b26b340b7805f34d8d0d5902e494 |
|
BLAKE2b-256 | b3401b3722277eba39c571f103ac82fbc672dd596d94d6425364920ebe0a2e58 |
File details
Details for the file websnap-1.1.4-py3-none-any.whl
.
File metadata
- Download URL: websnap-1.1.4-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.17.3 CPython/3.11.9 Linux/5.4.0-190-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 690033afe6d6fab624d93f9a54337edd98c53facc8525042773c78e1a7153823 |
|
MD5 | 95207c0dbda462fef1d3b2538dfe8765 |
|
BLAKE2b-256 | c73dcd5092551cc037556a96fb6f93b31948209b1c9b409feefc9490ddb8c818 |