Databricks Python Wheel dev tasks in a namespaced collection of tasks to enrich the Invoke CLI task runner.
Project description
Invoke Databricks Wheel Tasks
Databricks Python Wheel dev tasks in a namespaced collection of tasks to enrich the Invoke CLI task runner.
Getting Started
pip install invoke-databricks-wheel-tasks
This will also install invoke
and databricks-cli
.
Databricks CLI Config
It is assumed you will follow the documentation provided to setup databricks-cli
.
https://docs.databricks.com/dev-tools/cli/index.html
You'll need to setup a Personal Access Token. Then run the following command:
databricks configure --profile yourprofilename --token
Databricks Host (should begin with https://): https://myorganisation.cloud.databricks.com/
Token:
Which will create a configuration file in your home directory at ~/.databrickscfg
like:
cat ~/.databrickscfg
[yourprofilename]
host = https://myorganisation.cloud.databricks.com/
token = dapi0123456789abcdef0123456789abcdef
jobs-api-version = 2.1
Invoke Setup
tasks.py
from invoke import task, Collection, Tasks
import invoke_databricks_wheel_tasks as db
@task
def format(c):
"""Autoformat code for code style."""
c.run("black .")
c.run("isort .")
@task
def build(c):
"""Build wheel."""
c.run("rm -rfv dist/")
c.run("poetry build -f wheel")
# TODO: Find a neater way to capture root tasks as well as setting namespaces
ns = Collection(*[v for v in globals().values() if type(v) == Task])
ns.add_collection(db, name="db")
Once your tasks.py
is setup like this invoke
will have the extra commands:
λ invoke --list
Available tasks:
format Autoformat code for code style.
build Build wheel.
db.go Trigger default job associated for this project.
db.reinstall Reinstall version of wheel on cluster with a restart.
db.upload Upload wheel artifact to DBFS.
Invoke Configuration
Each of the tasks will require some combination of profile
, cluster-id
, job-id
etc.
You can create an invoke.yaml
file which will get loaded into the invoke
Context
Configuration
.
This will greatly simplify your typing by setting workspace specific flags for your dev iteration loop.
# https://docs.pyinvoke.org/en/latest/concepts/configuration.html
databricks:
profile: yourprofilename
cluster-id: your-cluster-id-here
job-id: 9999
artifact-path: "dbfs:/FileStore/wheels/"
wheel: "dbfs:/FileStore/wheels/projectname-0.1.0-py3-none-any.whl"
The Tasks
db.upload
This tasks will use dbfs
to empty the upload path and then copy the built wheel from dist/
.
This project assumes you're using poetry
or your wheel build output is located in dist/
.
If you have other requirements then pull requests welcome.
db.reinstall
After some trial and error, creating a job which creates a job cluster everytime is roughly 7 minutes.
However if you create an all purpose cluster that you:
- Mark the old wheel for uninstall
- restart cluster
- install updated wheel from dbfs location
This takes roughly 2 minutes which is a much tighter development loop. So these three steps are what db.reinstall
performs.
db.go
Assuming you have defined a job, that uses a pre-existing cluster, that has your latest wheel installed, this will create a manual trigger of your job with job-id
.
The triggering returns a run-id
, where this run-id
gets polled until the state gets to an end state.
Then a call to databricks runs get-output --run-id
happens to retrieve and error
, error_trace
and/or logs
to be emitted to console.
All Together
Assuming, you created your cluster and job definition you may want to create a root level @task
like:
@task(pre=[build, db.upload, db.reinstall, db.go], default=True)
def dev(c):
"""Default development loop."""
...
You will notice a few things here:
- The method has no implementation
...
- We are chaining a series of
@task
s in thepre=[...]
argument - The
default=True
on this root tasks means we could run eitherinvoke dev
or simplyinvoke
.
How cool is that?
Contributing
Open an issue and lets have a chat to triage needs or concerns before you sink too much effort on a PR.
Or if you're pretty confident your change is inline with the direction of this project then go ahead and open that PR.
Or feel free to fork this project and rename it to your own variant. It's cool, I don't mind.
Resources
Prior Art
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file invoke-databricks-wheel-tasks-0.1.1.tar.gz
.
File metadata
- Download URL: invoke-databricks-wheel-tasks-0.1.1.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.10.2 Linux/5.11.0-1028-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd47272274f2a6c3a481c92c8b40a801ebacf3f24f02c7372f443cceff89292b |
|
MD5 | 3f145f9c6d22b82edf0faaa662ed24e5 |
|
BLAKE2b-256 | daf5c9864a6e6068ab87d1c0339d1b8edce1d788a1335c7e73e205241a4a6560 |
File details
Details for the file invoke_databricks_wheel_tasks-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: invoke_databricks_wheel_tasks-0.1.1-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.10.2 Linux/5.11.0-1028-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70aa409f5e34b25b34f88961b6ed4fb1171dcfbacf2d014c4d7a2c5f1677b2ae |
|
MD5 | 768775584615896779d9e17b4593fda4 |
|
BLAKE2b-256 | 7d52ad0870ab90172f43c9e46468b0e35b59d5b8216b8d08da7870abbfd71848 |