Skip to main content

Databricks Python Wheel dev tasks in a namespaced collection of tasks to enrich the Invoke CLI task runner.

Project description

Invoke Databricks Wheel Tasks

Databricks Python Wheel dev tasks in a namespaced collection of tasks to enrich the Invoke CLI task runner.

Getting Started

pip install invoke-databricks-wheel-tasks

This will also install invoke and databricks-cli.

Databricks CLI Config

It is assumed you will follow the documentation provided to setup databricks-cli.

https://docs.databricks.com/dev-tools/cli/index.html

You'll need to setup a Personal Access Token. Then run the following command:

databricks configure --profile yourprofilename --token

Databricks Host (should begin with https://): https://myorganisation.cloud.databricks.com/
Token: 

Which will create a configuration file in your home directory at ~/.databrickscfg like:

cat ~/.databrickscfg

[yourprofilename]
host = https://myorganisation.cloud.databricks.com/
token = dapi0123456789abcdef0123456789abcdef
jobs-api-version = 2.1

Invoke Setup

tasks.py

from invoke import task, Collection, Tasks
import invoke_databricks_wheel_tasks as db

@task
def format(c):
    """Autoformat code for code style."""
    c.run("black .")
    c.run("isort .")

@task
def build(c):
    """Build wheel."""
    c.run("rm -rfv dist/")
    c.run("poetry build -f wheel")

# TODO: Find a neater way to capture root tasks as well as setting namespaces
ns = Collection(*[v for v in globals().values() if type(v) == Task])
ns.add_collection(db, name="db")

Once your tasks.py is setup like this invoke will have the extra commands:

λ invoke --list
Available tasks:

  format         Autoformat code for code style.
  build          Build wheel.
  db.runjob          Trigger default job associated for this project.
  db.reinstall   Reinstall version of wheel on cluster with a restart.
  db.upload      Upload wheel artifact to DBFS.
  db.clean       Clean wheel artifact from DBFS.

Invoke Configuration

Each of the tasks will require some combination of profile, cluster-id, job-id etc. You can create an invoke.yaml file which will get loaded into the invoke Context Configuration.

This will greatly simplify your typing by setting workspace specific flags for your dev iteration loop.

# https://docs.pyinvoke.org/en/latest/concepts/configuration.html
databricks:
  profile: yourprofilename
  cluster-id: your-cluster-id-here
  job-id: 9999
  artifact-path: "dbfs:/FileStore/wheels/"
  wheel: "dbfs:/FileStore/wheels/projectname-0.1.0-py3-none-any.whl"

The Tasks

db.upload

This task will use dbfs to empty the upload path and then copy the built wheel from dist/. This project assumes you're using poetry or your wheel build output is located in dist/.

If you have other requirements then pull requests welcome.

db.clean

This tasks will clean up all items on the target --artifact-path.

db.reinstall

After some trial and error, creating a job which creates a job cluster everytime is roughly 7 minutes.

However if you create an all purpose cluster that you:

  • Mark the old wheel for uninstall
  • restart cluster
  • install updated wheel from dbfs location

This takes roughly 2 minutes which is a much tighter development loop. So these three steps are what db.reinstall performs.

db.runjob

Assuming you have defined a job, that uses a pre-existing cluster, that has your latest wheel installed, this will create a manual trigger of your job with job-id.

The triggering returns a run-id, where this run-id gets polled until the state gets to an end state.

Then a call to databricks runs get-output --run-id happens to retrieve and error, error_trace and/or logs to be emitted to console.

All Together

Assuming, you created your cluster and job definition you may want to create a root level @task like:

@task(pre=[build, db.upload, db.reinstall, db.runjob], default=True)
def dev(c):
  """Default development loop."""
  ...

You will notice a few things here:

  1. The method has no implementation ...
  2. We are chaining a series of @tasks in the pre=[...] argument
  3. The default=True on this root tasks means we could run either invoke dev or simply invoke.

How cool is that?

Contributing

At all times, you have the power to fork this project, make changes as you see fit and then:

pip install https://github.com/user/repository/archive/branch.zip

Stackoverflow: pip install from github branch

That way you can run from your own custom fork in the interim or even in-house your work and simply use this project as a starting point. That is totally ok.

However if you would like to contribute your changes back, then open a Pull Request "across forks".

Once your changes are merged and published you can revert to the canonical version of pip installing this package.

If you're not sure how to make changes or if you should sink the time and effort, then open an Issue instead and we can have a chat to triage the issue.

Resources

Prior Art

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

invoke-databricks-wheel-tasks-0.5.2.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file invoke-databricks-wheel-tasks-0.5.2.tar.gz.

File metadata

File hashes

Hashes for invoke-databricks-wheel-tasks-0.5.2.tar.gz
Algorithm Hash digest
SHA256 bfc29321681fed886781ce6f2d4cc4d320142962e51f312bce82ef46b9aed47c
MD5 230248ca1689e4af08d9011e3f048fce
BLAKE2b-256 a689251c5a6e2d9c0433aad3420051b45e5027f5f6979d9f04c293284ff1353a

See more details on using hashes here.

File details

Details for the file invoke_databricks_wheel_tasks-0.5.2-py3-none-any.whl.

File metadata

File hashes

Hashes for invoke_databricks_wheel_tasks-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3b8c061fe2aade62b6bfa58ef2bc70f4e8a53a2771e19cb8f86412953f6841b7
MD5 bf2728e0c69ce417f64ee378cd30b45b
BLAKE2b-256 6f3592a521f0eebe77c5c8b715ee5675bf641688a8af0e1e0de7b56d81300dae

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page