No project description provided

Project description

OpenSAFELY job runner

A job runner is a service that encapsulates:

the task of checking out an OpenSAFELY study repo;
executing actions defined in its project.yaml configuration file when requested via a jobs queue; and
storing its results in a particular locations.

Quickrefs:

Playbooks

End users will find more information in the OpenSAFELY documentation.

Operating principles

In production, this software runs as a loop on a secure server within the infrastructure of the primary data provider. It polls an OpenSAFELY job server, looking for requests to run jobs.

Jobs belong to a workspace. This describes the git repo containing the OpenSAFELY-compliant project under execution; the git branch, and kind of database to use. The workspace also acts as a kind of namespace for partitioning outputs of its jobs.

An OpenSAFELY-compliant repo must provide a project.yaml file which describes how a requested job should be converted into a command (& arguments) that can be run in a subprocess on the secure server. It incorporates the idea of dependencies, so an action that generates a chart might depend on an action that extracts data from the database for that chart. See the Actions reference for more information.

An action can define outputs; these are persisted on disk and made available to subsequent actions in the workspace, and users who have permission to log into the server and view the raw files.

The runner takes care of executing dependencies in order. By default, it skips re-running a dependency whose previous run produced output that still exists in the production environment. The runner also reports status back to the job server, redacting possibly-sensitive information.

The runner is bundled as part of the opensafely-cli tool so users can test their actions locally.

Job structure

The job server serves jobs as JSON in the following format. First, a job must belong to a workspace:

{
    "workspace": {
        "name": "my workspace",
        "repo": "https://github.com/opensafely/job-integration-tests",
        "branch": "master",
        "db": "full"
    }
}

Possible values for "db" are "full", "slice", and "dummy".

A workspace is a way of associating jobs related to a given combination of branch, repository and database. To enqueue a job, a client POSTs JSON like this:

{
    "backend": "tpp",
    "action_id": "do_thing",
    "workspace_id": 1
}

Consuming jobs

A job runner is service installed on a machine that has access to a given backend. It receives jobs from the server and consumes those whose backend value matches the value of the current BACKEND environment variable.

It must also define three environment variables which are an RFC1838 connection URL; these correspond to the db requested in the job's workspace definition, and as such are named FULL_DATABASE_URL, SLICE_DATABASE_URL, and DUMMY_DATABASE_URL.

When a job is found, the following happens:

The corresponding repo is fetched. Private repos are accessed using the PRIVATE_REPO_ACCESS_TOKEN supplied in the environment.
Its project.yaml is parsed:
- Individual actions are extracted from this file
- A dependency graph is calculated for the requested action; for example, an action might depend on three previous actions before it can be run
- Each action in the graph is checked to see if it needs to be run
  - Actions that either: (a) already have output generated from a previous run; (b) are currently running; (c) failed on their last run do not need to be run
- If a dependency has failed, then the requested action fails
- If the dependency needs to be run, a new job is pushed to the queue, and the current job is postponed
- If an action has no dependencies needing to be run, then its docker run is executed
- On completion, a status code and message are reported back to the job server. On success, a list of output file locations are also posted. On failure, the message has any potentially-sensitive information redacted, and is associated with a unique string so that a user with requisite permissions can log into the production environment and examine the docker logs for the full error.

Output locations

Every action defines a list of outputs which are persisted to a permanent storage location. The project author must categorise these outputs as either highly_sensitive or moderately_sensitive. Any pseudonymised data which may be highly disclosive (e.g. without low number redaction) should be classed as highly_sensitive; data which the author believes could be released following review should be classed as moderately_sensitive. This design allows tiered levels of permissions for collaborators to review data outputs. For example, the study author would usually have access to highly_sensitive material for debugging; but other collaborators could have access to moderately_sensitive data to prepare it for release (for which it is planned to add a minimally_sensitive category).

Outputs are therefore persisted to filesystem paths according to the following environment variables:

# A location where cohort CSVs (one row per patient) should be
# stored. This folder must exist.
HIGH_PRIVACY_STORAGE_BASE=/home/opensafely/high_security

# A location where script outputs (some for publication) should be
# stored
MEDIUM_PRIVACY_STORAGE_BASE=/tmp/outputs/medium_security

Project.yaml

A valid project file looks like this:

version: "3.0"

expectations:
  population_size: 1000

actions:

  generate_study_population:
    run: cohortextractor:latest generate_cohort --study-definition study_definition
    outputs:
      highly_sensitive:
        cohort: output/input.csv

  run_model:
    run: stata-mp:latest analysis/model.do
    needs: [generate_study_population]
    outputs:
      moderately_sensitive:
        model: models/cox-model.txt
        figure: figures/survival-plot.png

See the project pipeline documentation for a detailed description of the project.yaml setup.

Local actions development

The cohortextractor command-line tool imports this library, and implements the action-parsing-and-running functionality as a series of synchronous docker commands, rather than asynchronously via the job queue.

For developers

Please see the additional information.

Project details

Release history Release notifications | RSS feed

2.52.4

May 25, 2022

2.52.3

May 20, 2022

2.52.2

May 20, 2022

2.52.1

May 20, 2022

2.52.0

May 19, 2022

This version

2.51.0

May 19, 2022

2.50.1

May 18, 2022

2.50.0

May 18, 2022

2.49.3

May 13, 2022

2.48.1

May 6, 2022

2.48.0

May 6, 2022

2.47.0

May 6, 2022

2.46.1

Apr 29, 2022

2.46.0

Apr 29, 2022

2.45.7

Apr 28, 2022

2.45.6

Apr 27, 2022

2.45.5

Apr 27, 2022

2.45.4

Apr 26, 2022

2.45.3

Apr 26, 2022

2.45.2

Apr 26, 2022

2.45.1

Apr 26, 2022

2.45.0

Apr 25, 2022

2.44.2

Apr 25, 2022

2.44.1

Apr 22, 2022

2.44.0

Apr 22, 2022

2.43.3

Apr 20, 2022

2.43.2

Apr 19, 2022

2.43.1

Apr 7, 2022

2.43.0

Apr 7, 2022

2.42.1

Mar 30, 2022

2.42.0

Mar 29, 2022

2.41.0

Mar 21, 2022

2.40.0

Mar 8, 2022

2.39.0

Mar 8, 2022

2.38.3

Mar 1, 2022

2.38.2

Feb 28, 2022

2.38.1

Feb 18, 2022

2.38.0

Jan 20, 2022

2.37.9

Dec 16, 2021

2.37.8

Dec 16, 2021

2.37.7

Dec 15, 2021

2.37.6

Dec 7, 2021

2.37.5

Dec 1, 2021

2.37.4

Nov 11, 2021

2.37.3

Nov 11, 2021

2.37.2

Nov 4, 2021

2.37.1

Nov 3, 2021

2.37.0

Nov 2, 2021

2.36.3

Nov 1, 2021

2.36.2

Nov 1, 2021

2.36.1

Nov 1, 2021

2.36.0

Oct 20, 2021

2.35.1

Oct 20, 2021

2.35.0

Oct 20, 2021

2.34.0

Oct 20, 2021

2.33.1

Oct 19, 2021

2.33.0

Oct 19, 2021

2.32.2

Oct 14, 2021

2.32.1

Oct 13, 2021

2.32.0

Oct 11, 2021

2.31.0

Oct 11, 2021

2.30.2

Oct 11, 2021

2.30.1

Oct 8, 2021

2.30.0

Oct 8, 2021

2.29.2

Oct 1, 2021

2.29.1

Sep 30, 2021

2.29.0

Sep 30, 2021

2.28.0

Sep 28, 2021

2.27.1

Sep 28, 2021

2.27.0

Sep 28, 2021

2.26.7

Sep 6, 2021

2.26.6

Sep 6, 2021

2.26.5

Sep 6, 2021

2.26.4

Sep 2, 2021

2.26.3

Sep 1, 2021

2.26.2

Sep 1, 2021

2.26.1

Aug 31, 2021

2.26.0

Aug 12, 2021

2.25.6

Aug 12, 2021

2.25.5

Aug 11, 2021

2.25.4

Aug 11, 2021

2.25.3

Aug 11, 2021

2.25.2

Aug 10, 2021

2.25.1

Aug 9, 2021

2.25.0

Aug 3, 2021

2.24.3

Aug 3, 2021

2.24.2

Jul 28, 2021

2.24.1

Jul 21, 2021

2.24.0

Jul 21, 2021

2.23.0

Jul 12, 2021

2.22.0

Jul 2, 2021

2.21.1

Jun 22, 2021

2.21.0

May 20, 2021

2.20.0

May 20, 2021

2.19.3

May 11, 2021

2.19.2

May 10, 2021

2.19.1

May 7, 2021

2.19.0

May 7, 2021

2.18.1

May 6, 2021

2.18.0

May 4, 2021

2.17.0

Apr 29, 2021

2.16.6

Mar 25, 2021

2.16.5

Mar 25, 2021

2.16.4

Mar 24, 2021

2.16.3

Mar 23, 2021

2.16.2

Mar 23, 2021

2.16.1

Mar 22, 2021

2.16.0

Mar 19, 2021

2.15.0

Mar 16, 2021

2.14.0

Mar 12, 2021

2.13.4

Mar 2, 2021

2.13.3

Mar 2, 2021

2.13.2

Mar 2, 2021

2.13.1

Mar 2, 2021

2.12.0

Mar 1, 2021

2.11.0

Feb 15, 2021

2.10.1

Feb 8, 2021

2.10.0

Feb 2, 2021

2.9.3

Feb 2, 2021

2.9.2

Feb 2, 2021

2.9.1

Feb 1, 2021

2.9.0

Jan 27, 2021

2.8.9

Jan 26, 2021

2.8.8

Jan 25, 2021

2.8.7

Jan 22, 2021

2.8.6

Jan 21, 2021

2.8.5

Jan 14, 2021

2.8.4

Jan 12, 2021

2.8.3

Jan 12, 2021

2.8.2

Jan 11, 2021

2.8.1

Jan 7, 2021

2.8.0

Jan 7, 2021

2.7.2

Jan 7, 2021

2.7.1

Jan 5, 2021

2.7.0

Jan 5, 2021

2.6.1

Jan 5, 2021

2.6.0

Jan 4, 2021

2.5.0

Jan 4, 2021

2.4.0

Jan 4, 2021

2.3.0

Dec 17, 2020

2.2.0

Dec 17, 2020

2.1.7

Dec 17, 2020

2.1.6

Dec 15, 2020

2.1.5

Dec 14, 2020

2.1.4

Dec 14, 2020

2.1.3

Dec 14, 2020

2.1.2

Dec 11, 2020

2.1.1

Dec 11, 2020

2.1.0

Dec 11, 2020

2.0.8

Dec 11, 2020

2.0.7

Dec 10, 2020

2.0.6

Dec 8, 2020

2.0.5

Dec 8, 2020

2.0.4

Dec 7, 2020

2.0.3

Dec 4, 2020

2.0.2

Dec 3, 2020

2.0.1

Dec 3, 2020

2.0.0

Dec 3, 2020

1.15.0

Dec 3, 2020

1.14.1

Dec 2, 2020

1.14.0

Nov 30, 2020

1.13.0

Nov 30, 2020

1.12.0

Nov 27, 2020

1.11.0

Nov 26, 2020

1.10.0

Nov 23, 2020

1.9.0

Nov 23, 2020

1.8.3

Nov 23, 2020

1.8.2

Nov 23, 2020

1.8.1

Nov 18, 2020

1.8.0

Nov 10, 2020

1.7.2

Nov 5, 2020

1.7.1

Nov 5, 2020

1.7.0

Nov 3, 2020

1.6.2

Nov 3, 2020

1.6.1

Oct 28, 2020

1.6.0

Oct 23, 2020

1.5.0

Oct 23, 2020

1.4.1

Oct 21, 2020

1.4.0

Oct 20, 2020

1.3.1

Oct 19, 2020

1.3.0

Oct 18, 2020

1.2.4

Oct 18, 2020

1.2.3

Oct 16, 2020

1.2.2

Oct 16, 2020

1.2.1

Oct 16, 2020

1.2.0

Oct 8, 2020

1.1.1

Sep 14, 2020

1.1.0

Sep 11, 2020

1.0.1

Sep 3, 2020

1.0.0.post4351245

Aug 31, 2020

1.0.0

Sep 3, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opensafely-jobrunner-2.51.0.tar.gz (79.3 kB view details)

Uploaded May 19, 2022 Source

Built Distribution

opensafely_jobrunner-2.51.0-py3-none-any.whl (92.5 kB view details)

Uploaded May 19, 2022 Python 3

File details

Details for the file opensafely-jobrunner-2.51.0.tar.gz.

File metadata

Download URL: opensafely-jobrunner-2.51.0.tar.gz
Upload date: May 19, 2022
Size: 79.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.13

File hashes

Hashes for opensafely-jobrunner-2.51.0.tar.gz
Algorithm	Hash digest
SHA256	`0747ffea5536253efe3adf4d3099068399dcf722255f691eb4b472da39a28a81`
MD5	`370fdaa960df683e96c85d290414be55`
BLAKE2b-256	`c7eb5fc6d5409ad588679e0a7b324640d69e7f1aab9afbb21cac7680d831ff4e`

See more details on using hashes here.

File details

Details for the file opensafely_jobrunner-2.51.0-py3-none-any.whl.

File metadata

Download URL: opensafely_jobrunner-2.51.0-py3-none-any.whl
Upload date: May 19, 2022
Size: 92.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.13

File hashes

Hashes for opensafely_jobrunner-2.51.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b8da785550c66412a812703a00b3b3694dc538e08a755bea24b8d2bd56742401`
MD5	`ac496fa90528146505e78a695039045c`
BLAKE2b-256	`9d53bdc7bfff2b680ef0f360b961e4b012d83af7039bd49a39ae20f3c7dd6fcc`