Skip to main content

No project description provided

Project description

dask-databricks

Cluster tools for running Dask on Databricks multi-node clusters.

Quickstart

To launch a Dask cluster on Databricks you need to create an init script with the following contents and configure your multi-node cluster to use it.

#!/bin/bash

# Install Dask + Dask Databricks
/databricks/python/bin/pip install --upgrade dask[complete] dask-databricks

# Start Dask cluster components
dask databricks run

Then from your Databricks Notebook you can quickly connect a Dask Client to the scheduler running on the Spark Driver Node.

import dask_databricks

client = dask_databricks.get_client()

Now you can submit work from your notebook to the multi-node Dask cluster.

def inc(x):
    return x + 1

x = client.submit(inc, 10)
x.result()

Dashboard

You can access the Dask dashboard via the Databricks driver-node proxy. The link can be found in Client or DatabricksCluster repr or via client.dashboard_link.

>>> print(client.dashboard_link)
https://dbc-dp-xxxx.cloud.databricks.com/driver-proxy/o/xxxx/xx-xxx-xxxx/8087/status

Releasing

Releases of this project are automated using GitHub Actions and the pypa/gh-action-pypi-publish action.

To create a new release push a tag to the upstream repo in the format x.x.x. The package will be built and pushed to PyPI automatically and then later picked up by conda-forge.

# Make sure you have an upstream remote
git remote add upstream git@github.com:dask-contrib/dask-databricks.git

# Create a tag and push it upstream
git tag x.x.x && git push upstream main --tags

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dask_databricks-0.3.0.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

dask_databricks-0.3.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file dask_databricks-0.3.0.tar.gz.

File metadata

  • Download URL: dask_databricks-0.3.0.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for dask_databricks-0.3.0.tar.gz
Algorithm Hash digest
SHA256 81851a9debca1f3fd46bba11f6584f07ed4408d7e7c7b1f77270d0ed9077373d
MD5 3b2ad7ff48680308cdb7f5b43090967f
BLAKE2b-256 89effa7f693f931defcee031d6aeb0540b4b800d5fb7bf0811f038f20f68dad8

See more details on using hashes here.

Provenance

File details

Details for the file dask_databricks-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dask_databricks-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 473fd13eae790d9826e7e9644492725c53c67686467c023386df2b4da9bc397e
MD5 4c8f63c2cf68b0396cb793c5927bdaff
BLAKE2b-256 8662e5e7f51d883937762430ca064aab26386407c03edab3a5e70a9d64c2060e

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page