A CLI tool for S3 data synchronizations.
Project description
Solgate
Yet another data sync pipelines job runner.
A CLI utility that is expected to be automated via container native workflow engines like Argo or Tekton.
Installation
pip install solgate
Configuration
Solgate relies on a configuration file that holds all the information required to fully perform the synchronization. This file is a standard INI/TOML file that contains following sections:
- Exactly one section starting with
source_
. This is a location specifying section where the data are sourced from. - Multiple (at least one) sections starting with
destination_
. These are also location specifying sections. Their purpose is to define sync destinations. - Section named
solgate
for a general configuration that is not specific to a single location.
Section solgate
All configuration in this section is optional. Use this section if you'd like to modify the default behavior. Default values are denoted below:
[solgate]
alerts_smtp_server = smtp.corp.redhat.com
alerts_from = solgate-alerts@redhat.com
alerts_to = dev-null@redhat.com
timedelta = 1d
Description:
alerts_smtp_server
,alerts_from
,alerts_to
are used for alerting onlytimedelta
defines a time window in which the objects in the source bucket must have been modified, to be eligible fo the bucket listing. Only files modified no later thantimedelta
from now are included.
Source section
[source_some_fancy_name]
aws_access_key_id = KEY_ID
aws_secret_access_key = SECRET
base_path = DH-PLAYPEN/storage/input ; at least the bucket name is required, sub path within this bucket is optional
endpoint_url = https://s3.amazonaws.com ; optional, defaults to s3.amazonaws.com
formatter = {date}/{collection}.{ext} ; optional, defaults to None
If the formatter
is not set, no repartitioning is expected to happen and the S3 object key is left intact, same as it is in the source bucket (within the base_path
context). Specifying the formatter
in the source section only, doesn't result in repartitioning of all object by itself, only those destinations that also have this option specified are eligible for object key modifications.
Destination sections
[destination_some_fancy_name]
aws_access_key_id = KEY_ID
aws_secret_access_key = SECRET
base_path = DH-PLAYPEN/storage/output ; at least the bucket name is required, sub path within this bucket is optional
endpoint_url = https://s3.upshift.redhat.com ; optional, defaults to s3.upshift.redhat.com
formatter = {date}/{collection}.{ext} ; optional, defaults to None
unpack = yes ; optional, defaults to False/no
The endpoint_url
defaults to a different value for destination compared to source section. This is due to the usual data origin and safe destination host.
If the formatter
is not set, no repartitioning is expected to happen and the S3 object key is left intact, same as it is in the source bucket (within the base_path
context). If repartitioning is desired, the formatter string must be defined in the source section as well - otherwise object name can't be parsed properly from the source S3 object key.
unpack
option specifies if the gunzipped archives should be unpacked during the transfer. The .gz
suffix is automatically dropped from the resulting object key, no matter if the repartitioning is on or off. Switching this option on results in weaker object validation, since the implicit metadata checksum and size checks can't be used to verify the file integrity.
Usage
Solgate is mainly intended for use in automation within Argo Workflows. However it can be also used as a standalone CLI tool for manual transfers and (via extensions) for (TBD) manifest scaffold generation and (TBD) deployed instance monitoring.
List bucket for files ready to be transferred
Before the actual sync can be run, it is required
solgate list
Sync objects
solgate transfer
Nofitication service
solgate notify
Workflow manifests
Additionally to the solgate
package source code this repository also features deployment manifests in the manifests
folder. The current implementation of Kubernetes manifests relies on Argo, Argo Events and are structured in a Kustomize format. Environments for deployment are specified in the manifests/overlays/ENV_NAME
folder.
Each environment features multiple solgate workflow instances. Configuration config.ini
file and selected triggers are defined in instance subfolder within the particular environment folder.
Deploy
Environment deployments are expected to be handled via Argo CD in AI-CoE SRE, however it can be done manually as well.
Local prerequisites:
Note: Yes, we don't use [ksops] here, instead we are currently using a different sops abstraction. It is because we would like to track as much of the configuration files is a readable format (and generate the secrets from them on the fly), opposed to ksops, which requires Kubernetes secret resources which are base64 encoded (harder to review).
Already deployed platform and running services:
Build and deploy manifests
kustomize build --enable_alpha_plugins manifests/overlays/ENV_NAME | oc apply -f -
Create a new instance
Will be handled via scaffold in next version!
-
Create new folder named after the instance in the selected environment overlay.
-
Create a
kustomization.yaml
file in this new folder with following content:apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization generators: - ./secret-generator.yaml
-
Create a
secret-generator.yaml
file in this new folder with following content:apiVersion: goabout.com/v1beta1 kind: SopsSecretGenerator metadata: name: NEW_INSTANCE_NAME files: - config.ini
-
Create a
config.ini
file in this folder and encrypt it via sops:vim overlays/ENV_NAME/NEW_INSTANCE_NAME/config.ini sops -e -i overlays/ENV_NAME/NEW_INSTANCE_NAME/config.ini
-
Create all event source patch files for this instance (
webhook-es.yaml
,calendar-es.yaml
, etc.). -
Update the resource and patch listing in the
overlays/ENV_NAME/kustomization.yaml
:resources: - ... - ./NEW_INSTANCE_NAME patchesStrategicMerge: - ... - ./NEW_INSTANCE_NAME/EVENT_SOURCE_TYPE-es.yaml # For each event source trigger used
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file solgate-3.1.0.tar.gz
.
File metadata
- Download URL: solgate-3.1.0.tar.gz
- Upload date:
- Size: 23.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/39.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a2ad972c014650ae94ab683cd7c3d73f7fe6e31a765a9eb654e3a77e52cfb538 |
|
MD5 | 3f09d009c01666f8059a0783a18d1adf |
|
BLAKE2b-256 | 9ae5ee0ec352f92c3ec433fb8cd5c4fa2851de0765b77be0dbb5a42f2096e839 |
File details
Details for the file solgate-3.1.0-py3-none-any.whl
.
File metadata
- Download URL: solgate-3.1.0-py3-none-any.whl
- Upload date:
- Size: 28.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/39.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a1d135f6f044d79bae33d6f0a30b5e47e980041986f1553e743322876757138 |
|
MD5 | d78d6c79aa8f6881176c935bea2f05b2 |
|
BLAKE2b-256 | 0abe1b25cf6ecaf33485b4d009e626b967f8f8d6bc9124f53eb939d67e8deac8 |