Skip to main content

A CLI tool for S3 data synchronizations.

Project description

Solgate

Yet another data sync pipelines job runner.

A CLI utility that is expected to be automated via container native workflow engines like Argo or Tekton.

Installation

pip install solgate

Configuration

Solgate relies on a configuration file that holds all the information required to fully perform the synchronization. This file is a standard INI/TOML file that contains following sections:

  • Exactly one section starting with source_. This is a location specifying section where the data are sourced from.
  • Multiple (at least one) sections starting with destination_. These are also location specifying sections. Their purpose is to define sync destinations.
  • Section named solgate for a general configuration that is not specific to a single location.

Section solgate

All configuration in this section is optional. Use this section if you'd like to modify the default behavior. Default values are denoted below:

[solgate]
alerts_smtp_server = smtp.corp.redhat.com
alerts_from        = solgate-alerts@redhat.com
alerts_to          = dev-null@redhat.com
timedelta          = 1d

Description:

  • alerts_smtp_server, alerts_from, alerts_to are used for alerting only
  • timedelta defines a time window in which the objects in the source bucket must have been modified, to be eligible fo the bucket listing. Only files modified no later than timedelta from now are included.

Source section

[source_some_fancy_name]
aws_access_key_id     = KEY_ID
aws_secret_access_key = SECRET
base_path             = DH-PLAYPEN/storage/input   ; at least the bucket name is required, sub path within this bucket is optional
endpoint_url          = https://s3.amazonaws.com   ; optional, defaults to s3.amazonaws.com
formatter             = {date}/{collection}.{ext}  ; optional, defaults to None

If the formatter is not set, no repartitioning is expected to happen and the S3 object key is left intact, same as it is in the source bucket (within the base_path context). Specifying the formatter in the source section only, doesn't result in repartitioning of all object by itself, only those destinations that also have this option specified are eligible for object key modifications.

Destination sections

[destination_some_fancy_name]
aws_access_key_id     = KEY_ID
aws_secret_access_key = SECRET
base_path             = DH-PLAYPEN/storage/output      ; at least the bucket name is required, sub path within this bucket is optional
endpoint_url          = https://s3.upshift.redhat.com  ; optional, defaults to s3.upshift.redhat.com
formatter             = {date}/{collection}.{ext}      ; optional, defaults to None
unpack                = yes                            ; optional, defaults to False/no

The endpoint_url defaults to a different value for destination compared to source section. This is due to the usual data origin and safe destination host.

If the formatter is not set, no repartitioning is expected to happen and the S3 object key is left intact, same as it is in the source bucket (within the base_path context). If repartitioning is desired, the formatter string must be defined in the source section as well - otherwise object name can't be parsed properly from the source S3 object key.

unpack option specifies if the gunzipped archives should be unpacked during the transfer. The .gz suffix is automatically dropped from the resulting object key, no matter if the repartitioning is on or off. Switching this option on results in weaker object validation, since the implicit metadata checksum and size checks can't be used to verify the file integrity.

Usage

Solgate is mainly intended for use in automation within Argo Workflows. However it can be also used as a standalone CLI tool for manual transfers and (via extensions) for (TBD) manifest scaffold generation and (TBD) deployed instance monitoring.

List bucket for files ready to be transferred

Before the actual sync can be run, it is required

solgate list

Sync objects

solgate transfer

Nofitication service

solgate notify

Workflow manifests

Additionally to the solgate package source code this repository also features deployment manifests in the manifests folder. The current implementation of Kubernetes manifests relies on Argo, Argo Events and are structured in a Kustomize format. Environments for deployment are specified in the manifests/overlays/ENV_NAME folder.

Each environment features multiple solgate workflow instances. Configuration config.ini file and selected triggers are defined in instance subfolder within the particular environment folder.

Deploy

Environment deployments are expected to be handled via Argo CD in AI-CoE SRE, however it can be done manually as well.

Local prerequisites:

Note: Yes, we don't use [ksops] here, instead we are currently using a different sops abstraction. It is because we would like to track as much of the configuration files is a readable format (and generate the secrets from them on the fly), opposed to ksops, which requires Kubernetes secret resources which are base64 encoded (harder to review).

Already deployed platform and running services:

Build and deploy manifests

kustomize build --enable_alpha_plugins manifests/overlays/ENV_NAME | oc apply -f -

Create a new instance

Will be handled via scaffold in next version!

  1. Create new folder named after the instance in the selected environment overlay.

  2. Create a kustomization.yaml file in this new folder with following content:

    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    
    generators:
      - ./secret-generator.yaml
    
  3. Create a secret-generator.yaml file in this new folder with following content:

    apiVersion: goabout.com/v1beta1
    kind: SopsSecretGenerator
    metadata:
    name: NEW_INSTANCE_NAME
    files:
      - config.ini
    
  4. Create a config.ini file in this folder and encrypt it via sops:

    vim overlays/ENV_NAME/NEW_INSTANCE_NAME/config.ini
    sops -e -i overlays/ENV_NAME/NEW_INSTANCE_NAME/config.ini
    
  5. Create all event source patch files for this instance (webhook-es.yaml, calendar-es.yaml, etc.).

  6. Update the resource and patch listing in the overlays/ENV_NAME/kustomization.yaml:

    resources:
      - ...
      - ./NEW_INSTANCE_NAME
    
    patchesStrategicMerge:
      - ...
      - ./NEW_INSTANCE_NAME/EVENT_SOURCE_TYPE-es.yaml # For each event source trigger used
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

solgate-3.0.1.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

solgate-3.0.1-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file solgate-3.0.1.tar.gz.

File metadata

  • Download URL: solgate-3.0.1.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/39.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.8

File hashes

Hashes for solgate-3.0.1.tar.gz
Algorithm Hash digest
SHA256 f53aef8b93240053de28159dc2022b7986366ef3576846bf5335b7908a911015
MD5 ae96707d337e7de203664597b4c9fe62
BLAKE2b-256 5e3cfadcc73387168e51a8200901e1a08ddbc0a1c859e23c6c2ab261462fd752

See more details on using hashes here.

File details

Details for the file solgate-3.0.1-py3-none-any.whl.

File metadata

  • Download URL: solgate-3.0.1-py3-none-any.whl
  • Upload date:
  • Size: 28.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/39.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.8

File hashes

Hashes for solgate-3.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c346c5d3671d528f26a86f64fe9edd99afb582c725eac0d71488dbbabe1823b1
MD5 bca67b0a7d8f7533edec485d32bc7a61
BLAKE2b-256 96b84558d1be8a9a42584e6115107ff423c49ab6bfc668cf8872177c940bd74b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page