FlowUI project

Project description

FlowUI Project

An architecture for streamlining the production of Operators and the provisioning of Cloud infrastructure for Apache Airflow, with an interactive GUI for workflows creation!

Some points which we are trying to put together with FlowUI:

make extensive use of Airflow for workflows management
standardize the production of Operators that could run either on Batch Jobs, Kubernetes pods or locally (the same machine serving Airflow)
these could serve heavy ML as well as light dataset updating workflows
automatically import the list of Operators to a web GUI where the users could create their own workflows
a more user friendly GUI for workflows supervision and management

Our goal is to build an architecture that abstracts the logics behind some of these points and automatizes as much as possible the continuous delivery lifecycle.

FlowUI Project - AWS Infrastructure

Per Platform:

Frontend server
Backend server
Airflow server
Database
Code repository with collection of Operators (Github)
Container resgistry with Docker images to run Operators
Job definitions on AWS Batch: one per Operator (also per user? EFS mount limitation)
Kubernetes infra (helm charts)

Per user of the platform:

S3 bucket: for cheapest storage of all user’s data
EFS: for shared storage between Tasks at DAG runtime

Airflow instance

We still don't know if we should spin up one separate Airflow running instance per user, to be decided!

The Airflow instance should have access to a mounted EFS volume, shared with the other registered resources (Batch Job Definitions, Lambdas, ...). This volume is where the running Airflow will find its DAG files, the available Operator functions, plugins, and where it will store its logs.

For each running Airflow instance, these are the ENV variables that must be defined:

AWS_REGION_NAME: aws region name, must be the same for all resources registered
AWS_BATCH_JOB_DEFINITION_{operator_name}_{operator_version}: the job definition arn, one for each registered Operator
AWS_BATCH_JOB_QUEUE_{operator_name}_{operator_version}: the job queue arn, one for each registered Operator
...

File System structure:

In principle, user-specific data would be stored in S3. The loading of specific artifacts onto mounted File Systems (such as AWS EFS) could be done as per request of an Operator (LoadDatasetOperator, LoadModelOperator, etc…) so heavy data would be readily available to the containers running Jobs. EFS pricing is not too bad, but it is twice the price of S3. It also charges per data transfer. So we would need to devise some housekeeping rules to clean / transfer the artifacts and runs results back to S3 (e.g. every 24 hours or something). A mounted File System would also serve as the source of dags, logs and plugins for Airflow. A mounted File System would also serve as the source of Operators files, synced with the code repository and readily accessible to the instances running the tasks. A mounted File System would also serve as a temporary location for Tasks results that might be useful to downstream Tasks.

/

/airflow
..../logs
..../plugins
..../dags
......../workflow_1.py
......../workflow_2.py

/operators_repositories
..../{REPOSITORY-ID}
......../dependencies
............/dependencies_map.json
......../operators
............/{operator-name}
................/metadata.json    # OPTIONAL
................/model.py         # REQUIRED
................/operator.py      # REQUIRED

/dataset
..../{dataset-id}
......../file1.mat
......../file2.csv
......../file3.json

/runs
..../{dag-id}
......../{run-id}
............/{task-id}
................/log.txt
................/result.npy
................/result.html

Operators

We write the Operators ourselves, and this will be the main customizable point for each project. Each Operator will have:

A operator.py file with the custom code to be executed, the operator_function()
A metadata.py file containing the Operators metadata and frontend node style
A model.py file containing the Pydantic model that defines the input of the operator_function()

Depending on how the Operators will be running:

If running on AWS Batch, a Job Definition with:
- A container image that runs this Operator
- A Role with necessary permissions (access to EFS, S3, Database, etc…)
- Mount of the EFS (how to make it if one EFS per user?)
- Batch Compute Environment and Queue this Job is going to use
- Other specific configurations (vcpu, ram, retries… can be changed at Job submit)
If running locally (or on the same server as Airflow):
- A container image that runs this Operator

ENV vars

ENV vars definition levels:

Host ENV

GITHUB_ACCESS_TOKEN
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION_NAME
FLOWUI_PATH_HOST (only for local dev)

config.ini

PROJECT_NAME
FLOWUI_DEPLOY_MODE
VOLUME_MOUNT_PATH_HOST
CODE_REPOSITORY_SOURCE
GITHUB_REPOSITORY_NAME

FlowUI CLI

CODE_REPOSITORY_PATH
AIRFLOW_UID

docker-compose.yaml

FLOWUI_PATH_DOCKER
VOLUME_MOUNT_PATH_DOCKER
AIRFLOW_HOME
AIRFLOW__CORE__EXECUTOR
AIRFLOW__CORE__SQL_ALCHEMY_CONN
AIRFLOW__CELERY__RESULT_BACKEND
AIRFLOW__CELERY__BROKER_URL
AIRFLOW__CORE__FERNET_KEY
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION
AIRFLOW__CORE__LOAD_EXAMPLES
AIRFLOW__CORE__ENABLE_XCOM_PICKLING
AIRFLOW__API__AUTH_BACKEND
_PIP_ADDITIONAL_REQUIREMENTS
AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL

Project details

Release history Release notifications | RSS feed

0.0.22

Apr 3, 2023

0.0.21

Mar 31, 2023

0.0.20

Mar 20, 2023

0.0.19

Mar 10, 2023

0.0.18

Mar 9, 2023

0.0.17

Jan 24, 2023

0.0.16

Jan 9, 2023

0.0.15

Jan 9, 2023

0.0.14

Dec 16, 2022

0.0.13

Dec 13, 2022

0.0.12

Aug 9, 2022

0.0.11

Aug 3, 2022

0.0.10

Aug 3, 2022

0.0.9

Aug 2, 2022

0.0.8

Jul 28, 2022

0.0.7

Jul 25, 2022

This version

0.0.6

Jul 25, 2022

0.0.5

Jul 25, 2022

0.0.4

Jul 23, 2022

0.0.3

Jul 23, 2022

0.0.2

Jul 20, 2022

0.0.1

Jul 20, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowui-project-0.0.6.tar.gz (34.5 kB view details)

Uploaded Jul 25, 2022 Source

Built Distribution

flowui_project-0.0.6-py3-none-any.whl (41.2 kB view details)

Uploaded Jul 25, 2022 Python 3

File details

Details for the file flowui-project-0.0.6.tar.gz.

File metadata

Download URL: flowui-project-0.0.6.tar.gz
Upload date: Jul 25, 2022
Size: 34.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.9.7

File hashes

Hashes for flowui-project-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`93ed82ffd1d79b94900df6a0d45418c41ee7457429ff41c0ae7a403db2e1a527`
MD5	`5240ed060e947ae4f636d4699001acba`
BLAKE2b-256	`c4aeb3db4a15a1c21d25721134987376e4a93b158433c5f0d3e18c69838c97d8`

See more details on using hashes here.

File details

Details for the file flowui_project-0.0.6-py3-none-any.whl.

File metadata

Download URL: flowui_project-0.0.6-py3-none-any.whl
Upload date: Jul 25, 2022
Size: 41.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.9.7

File hashes

Hashes for flowui_project-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b7a193a12b45c2c35fe754aba6611ab3f1ab18f61e459bec6ce862501c1e9bdc`
MD5	`45639f601a625ced443a8ff7d1b8a6c0`
BLAKE2b-256	`155c029ff93a6d27595e0503b927d0f74459a707a271026374ad4c5ff82f6e42`