Databricks Python Wheel dev tasks in a namespaced collection of tasks to enrich the Invoke CLI task runner.

These details have not been verified by PyPI

Project links

Project description

Invoke Databricks Wheel Tasks

Databricks Python Wheel dev tasks in a namespaced collection of tasks to enrich the Invoke CLI task runner.

Getting Started

pip install invoke-databricks-wheel-tasks

This will also install invoke and databricks-cli.

Databricks CLI Config

It is assumed you will follow the documentation provided to setup databricks-cli.

https://docs.databricks.com/dev-tools/cli/index.html

You'll need to setup a Personal Access Token. Then run the following command:

databricks configure --profile yourprofilename --token

Databricks Host (should begin with https://): https://myorganisation.cloud.databricks.com/
Token:

Which will create a configuration file in your home directory at ~/.databrickscfg like:

cat ~/.databrickscfg

[yourprofilename]
host = https://myorganisation.cloud.databricks.com/
token = dapi0123456789abcdef0123456789abcdef
jobs-api-version = 2.1

Invoke Setup

tasks.py

from invoke import task
from invoke_databricks_wheel_tasks import * # noqa

@task
def format(c):
    """Autoformat code for code style."""
    c.run("black .")
    c.run("isort .")

@task
def build(c):
    """Build wheel."""
    c.run("rm -rfv dist/")
    c.run("poetry build -f wheel")

Once your tasks.py is setup like this invoke will have the extra commands:

λ invoke --list
Available tasks:

  build                  Build wheel.
  clean                  Clean wheel artifact from DBFS.
  dbfs-wheel-path        Generate the target path (including wheelname) this wheel should be uploaded to.
  dbfs-wheel-path-root   Generate the target path (excluding wheelname) this wheel should be uploaded to.
  define-job             Generate templated Job definition and upsert by Job Name in template.
  format                 Autoformat code for code style.
  poetry-wheel-name      Display the name of the wheel file poetry would build.
  reinstall              Reinstall version of wheel on cluster with a restart.
  runjob                 Trigger default job associated for this project.
  upload                 Upload wheel artifact to DBFS.

The Tasks

upload

This task will use dbfs to empty the upload path and then copy the built wheel from dist/. This project assumes you're using poetry or your wheel build output is located in dist/.

If you have other requirements then pull requests welcome.

clean

This tasks will clean up all items on the target --artifact-path.

reinstall

After some trial and error, creating a job which creates a job cluster everytime is roughly 7 minutes.

However if you create an all purpose cluster that you:

Mark the old wheel for uninstall
restart cluster
install updated wheel from dbfs location

This takes roughly 2 minutes which is a much tighter development loop. So these three steps are what db.reinstall performs.

runjob

Assuming you have defined a job, that uses a pre-existing cluster, that has your latest wheel installed, this will create a manual trigger of your job with job-id.

The triggering returns a run-id, where this run-id gets polled until the state gets to an end state.

Then a call to databricks runs get-output --run-id happens to retrieve and error, error_trace and/or logs to be emitted to console.

define-job

You may want to run invoke --help define-job to get the help documentation.

There are a few arguments that get abbreviated which we will explain before discussing how they work together.

--jinja-template or -j: This is a Jinja2 Template that must resolve to a valid Databricks Jobs JSON payload spec.
--config-file or -c: This is either a JSON or YAML file to define the config, that gets used to parametrise the above jinja-template. This config-file can also be a Jinja template to inject values that can only be known at runtime like the git feature branch you are currently on. By default it is treated as a plain config file and not a Jinja Template unless environment-variable flags are specified (see next).
--environment-variable or -e: This flag can be repeated many times to specify multiple values. It takes a string in the key=value format.
- Eg -e branch=$(git branch --show-current) -e main_wheel=$MAIN -e utils_wheel=$UTILS

So an example command could look like:

invoke define-job \
    -j jobs/base-job-template.json.j2 \
    -c jobs/customer360-etl-job.yaml \
    -e branch=$(git branch --show-current) \
    -e main_whl=$(invoke dbfs-wheel-path) \
    -e utils_whl=$UTILS_DBFS_WHEEL_PATH

Then the -e values can get templated into customer360-etl-job.yml. Then that YAML file gets parsed and injected into base-job-template.json.j2

This will then check the list of Jobs in your workspace, see if a job with the same name exists already and perform a create or replace job operation. This expects the config-file to have a key name to be able to cross check the list of existing jobs.

The beauty is that the specifics of config-file and jinja-template is completely up to you.

config-file is the minimal datastructure you need to configure jinja-template and you just use the Jinja Control Structures (if-else, for-loop, etc) to traverse it and populate jinja-template.

Contributing

At all times, you have the power to fork this project, make changes as you see fit and then:

pip install https://github.com/user/repository/archive/branch.zip

Stackoverflow: pip install from github branch

That way you can run from your own custom fork in the interim or even in-house your work and simply use this project as a starting point. That is totally ok.

However if you would like to contribute your changes back, then open a Pull Request "across forks".

Once your changes are merged and published you can revert to the canonical version of pip installing this package.

If you're not sure how to make changes or if you should sink the time and effort, then open an Issue instead and we can have a chat to triage the issue.

Resources

Prior Art

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.0

Oct 27, 2022

0.8.0b3 pre-release

Oct 5, 2022

This version

0.8.0b1 pre-release

Sep 12, 2022

0.7.1

May 9, 2022

0.7.0

May 4, 2022

0.6.3

May 4, 2022

0.6.2

May 4, 2022

0.6.1

May 3, 2022

0.6.0

May 3, 2022

0.5.6

Apr 20, 2022

0.5.5

Apr 19, 2022

0.5.4

Apr 19, 2022

0.5.3

Mar 20, 2022

0.5.2

Mar 16, 2022

0.5.1

Mar 15, 2022

0.5.0

Mar 15, 2022

0.4.0

Mar 15, 2022

0.3.0

Mar 13, 2022

0.2.0

Mar 12, 2022

0.1.1

Mar 7, 2022

0.1.0

Mar 7, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

invoke-databricks-wheel-tasks-0.8.0b1.tar.gz (12.7 kB view details)

Uploaded Sep 12, 2022 Source

Built Distribution

invoke_databricks_wheel_tasks-0.8.0b1-py3-none-any.whl (12.1 kB view details)

Uploaded Sep 12, 2022 Python 3

File details

Details for the file invoke-databricks-wheel-tasks-0.8.0b1.tar.gz.

File metadata

Download URL: invoke-databricks-wheel-tasks-0.8.0b1.tar.gz
Upload date: Sep 12, 2022
Size: 12.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.2.0 CPython/3.10.6 Linux/5.15.0-1019-azure

File hashes

Hashes for invoke-databricks-wheel-tasks-0.8.0b1.tar.gz
Algorithm	Hash digest
SHA256	`209463fe67576b347ee0eadff7315d3056006ffb56842d039e5559c5164e0d29`
MD5	`206ee25cda77bc6e44d5aafc24f9105d`
BLAKE2b-256	`f3ac96bdb250a28030f841b142ac6956339f348e90fa3321e2d6df065bb9938c`

See more details on using hashes here.

File details

Details for the file invoke_databricks_wheel_tasks-0.8.0b1-py3-none-any.whl.

File metadata

Download URL: invoke_databricks_wheel_tasks-0.8.0b1-py3-none-any.whl
Upload date: Sep 12, 2022
Size: 12.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.2.0 CPython/3.10.6 Linux/5.15.0-1019-azure

File hashes

Hashes for invoke_databricks_wheel_tasks-0.8.0b1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ab6a0433deef42160efab4484f274c0d000a855340b70165e02c5c4619a5058a`
MD5	`c4b8b0ed76291aba4f100cd9c1a19d84`
BLAKE2b-256	`7397fcc7af03ba19347a78b11109765a3ca3a566148819c74e8c50fb7b518437`