Skip to main content

No project description provided

Project description

dask-databricks

Cluster tools for running Dask on Databricks multi-node clusters.

Quickstart

To launch a Dask cluster on Databricks you need to create an init script with the following contents and configure your multi-node cluster to use it.

#!/bin/bash

# Install Dask + Dask Databricks
/databricks/python/bin/pip install --upgrade dask[complete] dask-databricks

# Start Dask cluster components
dask databricks run

Then from your Databricks Notebook you can quickly connect a Dask Client to the scheduler running on the Spark Driver Node.

import dask_databricks

client = dask_databricks.get_client()

Now you can submit work from your notebook to the multi-node Dask cluster.

def inc(x):
    return x + 1

x = client.submit(inc, 10)
x.result()

Dashboard

You can access the Dask dashboard via the Databricks driver-node proxy. The link can be found in Client or DatabricksCluster repr or via client.dashboard_link.

>>> print(client.dashboard_link)
https://dbc-dp-xxxx.cloud.databricks.com/driver-proxy/o/xxxx/xx-xxx-xxxx/8087/status

Releasing

Releases of this project are automated using GitHub Actions and the pypa/gh-action-pypi-publish action.

To create a new release push a tag to the upstream repo in the format x.x.x. The package will be built and pushed to PyPI automatically and then later picked up by conda-forge.

# Make sure you have an upstream remote
git remote add upstream git@github.com:dask-contrib/dask-databricks.git

# Create a tag and push it upstream
git tag x.x.x && git push upstream main --tags

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dask_databricks-0.3.1.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

dask_databricks-0.3.1-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file dask_databricks-0.3.1.tar.gz.

File metadata

  • Download URL: dask_databricks-0.3.1.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for dask_databricks-0.3.1.tar.gz
Algorithm Hash digest
SHA256 b6ac449d62b47cc5a27883d2b4c1f7fdc54613af8e4f481aa700ca0732d61bc8
MD5 75f4d34261b57a8b4e6d01ac5cce8e19
BLAKE2b-256 1f3a53a695df1addddb126f8eab370f0481e97e838d38a99aca43a524298fe87

See more details on using hashes here.

Provenance

File details

Details for the file dask_databricks-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for dask_databricks-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4fb6b7a56a6aa9c24a09bd7fb373e30d3d953d9e3817d57abb27c21d67529f70
MD5 b7dea5fc6e6e970642c1f95934bd3cac
BLAKE2b-256 24c681623cbbaf69b42a7405ebc502782116b5c048c281f325d17f726501b128

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page