Skip to main content

Running dynamic DAG workflows

Project description

Adage - A DAG Executor

CI Code Health PyPI Coverage Status Documentation Status

This is a small experimental package to see how one could describe workflows that are not completely known at definition time. Tasks should be runnable both in a multiprocessing pool, or using a number of celery workers or a IPython cluster.

Example

example image

Problem

Workflows can be comfortably represented by directed acyclic graphs (DAGs). But sometimes the precise structure of the graph is not known before the processing starts. Instead often one only has partial information of what kind of edges are possible and depending on a certain result in a node the DAG might be appended with more nodes and edges.

For example, one node (call it "node A") could be downloading a list of files, which can be processed in parallel. The DAG would therefore have one node for each file-processing (let's call them "node_file_1" to "node_file_n") depending on "node A". Since the exact number of files is not known until run-time, we cannot map out the DAG beforehand. Also after this "map"-step one might want to have a "reduce"-step to merge the individual result. This can also only be scheduled after the number of "map"-nodes is known.

Another example is that one might have a whole set of nodes that run a certain kind of task (e.g. produce a PDF file). One could imagine wanting to have a "reduce"-type task which merges all these individual PDF files. While any given node does not know where else PDF-generating tasks are scheduled, one can wait until no edges to PDF-generating tasks are possible anymore to then append a PDF-merging node to the DAG.

Solution

Generically, we want individual nodes to have a limited set of operations they can do on the DAG that they are part of. Specifically we can only allow queries on the structure of the DAG as well as append operations, nodes must not be able to remove nodes. The way we implement this is that we have a append-only record of scheduled rules. A rule is a pair of functions (predicate,body) that operate on the DAG. The predicate is a query function that inspects the graph to decide whether the DAG has enough information to apply the body (e.g. are edges of a certain type still possible to append or not?). If the DAG does have enough information the body which is an append-only operation on the DAG is applied, i.e. nodes are added . Periodically the list of rules is iterated to extend the DAG where possible.

Rules for Rules

There are a couple of rule that the rules need to obey themselves in order to make

  • it is the responsibility of the predicate to signal that all necessary nodes for the body are present in the DAG. Examples are:

    • wait until no nodes of a particular type could possibly be added to the DAG. This requires us to know what kind of edges are valid on a global level.
      • wait until a certain number of nodes are present in the DAG (say )
    • select a certain set of nodes by their unique id (useful to attach a sub-DAG to a existing node from within that node)
  • the only valid edges that you can dynamically add are ones that point away from existing nodes to new nodes.. edges directed towards existing nodes would introduce new dependencies which were not present before and so that job might have already run, or be currently running

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adage-0.10.2.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

adage-0.10.2-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file adage-0.10.2.tar.gz.

File metadata

  • Download URL: adage-0.10.2.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.9.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for adage-0.10.2.tar.gz
Algorithm Hash digest
SHA256 9327e256226070ccf69477e7689ac81a3c1a792eadad5ba45749a6a46fdfdac8
MD5 4982d09565b1c0075b55ec932f301f7b
BLAKE2b-256 2b0d5e71d852fcced791ca0b1f566fc9f3f38733851a034cbe8826a4e43ff672

See more details on using hashes here.

File details

Details for the file adage-0.10.2-py3-none-any.whl.

File metadata

  • Download URL: adage-0.10.2-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.9.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for adage-0.10.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f03a80773f1267463f9054cca8e53108b1eefe2f9056fba643a0070d41e429a4
MD5 af3ca6e0b197fac137fa65ff0c56a812
BLAKE2b-256 6190aa54deb56644bfc46dfe13a93080ca6f8f5b5f357a947c318ddfc76ff41e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page