Skip to main content

A command line interface for Databricks

Project description

databricks-cli
==============
.. image:: https://travis-ci.org/databricks/databricks-cli.svg?branch=master
:target: https://travis-ci.org/databricks/databricks-cli
:alt: Build Status

The Databricks Command Line Interface (CLI) is an open source tool which provides an easy to use interface to
the Databricks platform. The CLI is built on top of the Databricks Rest APIs. Currently,
the CLI fully implements the DBFS API and the Workspace API.

**PLEASE NOTE**, this CLI is under active development and is released as
an experimental client. This means that interfaces are still subject to change.

If you're interested in contributing to the project please reach out.
In addition, please leave bug reports as issues on our `Github project <https://github.com/databricks/databricks-cli>`_.

Requirements
------------

- Python Version > 2.7.9
- Python 3 is not supported

Installation
---------------

To install simply run
``pip install --upgrade databricks-cli``

Then set up authentication using username/password or `authentication token <https://docs.databricks.com/api/latest/authentication.html#token-management>`_. Credentials are stored at ``~/.databrickscfg``.

- ``databricks configure`` (enter hostname/username/password at prompt)
- ``databricks configure --token`` (enter hostname/auth-token at prompt)

Then you're all set to go! To test that your authentication information is working, try a quick test like
``databricks workspace ls``.

Known Issues
---------------
``AttributeError: 'module' object has no attribute 'PROTOCOL_TLSv1_2'``

The Databricks web service requires clients speak TLSV1.2. The built in
version of Python for MacOS does not have this version of TLS built in.

To use databricks-cli you should install a version of Python which has ``ssl.PROTOCOL_TLSv1_2``.
For MacOS, the easiest way may be to install Python with `Homebrew <https://brew.sh/>`_.

Workspace CLI Examples
-----------------------
The implemented commands for the Workspace CLI can be listed by running ``databricks workspace -h``.
Commands are run by appending them to ``databricks workspace``. To make it easier to use the workspace
CLI, feel free to alias ``databricks workspace`` to something shorter. For more information
reference `Aliasing Command Groups <#aliasing-command-groups>`_.

.. code::

$ databricks workspace -h
Usage: databricks workspace [OPTIONS] COMMAND [ARGS]...

Utility to interact with the Databricks Workspace. Workspace paths must be
absolute and be prefixed with `/`.

Options:
-v, --version
-h, --help Show this message and exit.

Commands:
delete Deletes objects from the Databricks...
export Exports a file from the Databricks workspace...
export_dir Recursively exports a directory from the...
import Imports a file from local to the Databricks...
import_dir Recursively imports a directory from local to...
list List objects in the Databricks Workspace
ls List objects in the Databricks Workspace
mkdirs Make directories in the Databricks Workspace.
rm Deletes objects from the Databricks...

Listing Workspace Files
^^^^^^^^^^^^^^^^^^^^^^^^
.. code::

$ databricks workspace ls /Users/example@databricks.com
Usage Logs ETL
Common Utilities
guava-21.0

Importing a local directory of notebooks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The ``databricks workspace import_dir`` command will recursively import a directory
from the local filesystem to the Databricks workspace. Only directories and
files with the extensions of ``.scala``, ``.py``, ``.sql``, ``.r``, ``.R`` are imported.
When imported, these extensions will be stripped off the name of the notebook.

To overwrite existing notebooks at the target path, the flag ``-o`` must be added.

.. code::

$ tree
.
├── a.py
├── b.scala
├── c.sql
├── d.R
└── e

.. code::

$ databricks workspace import_dir . /Users/example@databricks.com/example
./a.py -> /Users/example@databricks.com/example/a
./b.scala -> /Users/example@databricks.com/example/b
./c.sql -> /Users/example@databricks.com/example/c
./d.R -> /Users/example@databricks.com/example/d

.. code::

$ databricks workspace ls /Users/example@databricks.com/example -l
NOTEBOOK a PYTHON
NOTEBOOK b SCALA
NOTEBOOK c SQL
NOTEBOOK d R
DIRECTORY e

Exporting a workspace directory to the local filesystem
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Similarly, it is possible to export a directory of notebooks from the Databricks workspace
to the local filesystem. To do this, the command is simply

.. code::

$ databricks workspace export_dir /Users/example@databricks.com/example .

DBFS CLI Examples
-----------------------
The implemented commands for the DBFS CLI can be listed by running ``databricks fs -h``.
Commands are run by appending them to ``databricks fs`` and all dbfs paths should be prefixed with
``dbfs:/``. To make the command less verbose, we've
gone ahead and aliased ``dbfs`` to ``databricks fs``.

.. code::

$ databricks fs -h
Usage: databricks fs [OPTIONS] COMMAND [ARGS]...

Utility to interact with DBFS. DBFS paths are all prefixed
with dbfs:/. Local paths can be absolute or local.

Options:
-v, --version
-h, --help Show this message and exit.

Commands:
configure
cp Copy files to and from DBFS.
ls List files in DBFS.
mkdirs Make directories in DBFS.
mv Moves a file between two DBFS paths.
rm Remove files from dbfs.

Copying a file to DBFS
^^^^^^^^^^^^^^^^^^^^^^^^
.. code::

dbfs cp test.txt dbfs:/test.txt
# Or recursively
dbfs cp -r test-dir dbfs:/test-dir

Copying a file from DBFS
^^^^^^^^^^^^^^^^^^^^^^^^
.. code::

dbfs cp dbfs:/test.txt ./test.txt
# Or recursively
dbfs cp -r dbfs:/test-dir ./test-dir

Jobs CLI Examples
--------------------
The implemented commands for the jobs CLI can be listed by running ``databricks jobs -h``.
Job run commands are handled by ``databricks runs -h``.

.. code::

$ databricks jobs -h
Usage: databricks jobs [OPTIONS] COMMAND [ARGS]...

Utility to interact with jobs.

This is a wrapper around the jobs API
(https://docs.databricks.com/api/latest/jobs.html). Job runs are handled
by ``databricks runs``.

Options:
-v, --version [VERSION]
-h, --help Show this message and exit.

Commands:
create Creates a job.
delete Deletes the specified job.
get Describes the metadata for a job.
list Lists the jobs in the Databricks Job Service.
reset Resets (edits) the definition of a job.
run-now Runs a job with optional per-run parameters.

.. code::

$ databricks runs -h
Usage: databricks runs [OPTIONS] COMMAND [ARGS]...

Utility to interact with job runs.

Options:
-v, --version [VERSION]
-h, --help Show this message and exit.

Commands:
cancel Cancels the run specified.
get Gets the metadata about a run in json form.
list Lists job runs.
submit Submits a one-time run.

Listing and finding jobs
^^^^^^^^^^^^^^^^^^^^^^^^^
The ``databricks jobs list`` command has two output formats, ``JSON`` and ``TABLE``.
The ``TABLE`` format is outputted by default and returns a two column table (job ID, job name).

To find a job by name

.. code::

databricks jobs list | grep "JOB_NAME"

Copying a job
^^^^^^^^^^^^^^^^^^^^^^^^
This example requires the program `jq <#jq>`_.

.. code::

SETTINGS_JSON=$(databricks jobs get --job-id 284907 | jq .settings)
# JQ Explanation:
# - peek into top level `settings` field.
databricks jobs create --json "$SETTINGS_JSON"

Deleting "Untitled" Jobs
^^^^^^^^^^^^^^^^^^^^^^^^
.. code::

databricks jobs list --output json | jq '.jobs[] | select(.settings.name == "Untitled") | .job_id' | xargs -n 1 databricks jobs delete --job-id
# Explanation:
# - List jobs in JSON.
# - Peek into top level `jobs` field.
# - Select only jobs with name equal to "Untitled"
# - Print those job ID's out.
# - Invoke `databricks jobs delete --job-id` once per row with the $job_id appended as an argument to the end of the command.

Clusters CLI Examples
-----------------------
The implemented commands for the clusters CLI can be listed by running ``databricks clusters -h``.

.. code::

$ databricks clusters -h
Usage: databricks clusters [OPTIONS] COMMAND [ARGS]...

Utility to interact with Databricks clusters.

Options:
-v, --version [VERSION]
-h, --help Show this message and exit.

Commands:
create Creates a Databricks cluster.
delete Removes a Databricks cluster given its ID.
get Retrieves metadata about a cluster.
list Lists active and recently terminated clusters.
list-node-types Lists possible node types for a cluster.
list-zones Lists zones where clusters can be created.
restart Restarts a Databricks cluster given its ID.
spark-versions Lists possible Databricks Runtime versions...
start Starts a terminated Databricks cluster given its ID.

Listing runtime versions
^^^^^^^^^^^^^^^^^^^^^^^^^
.. code::

databricks clusters spark-versions

Listing node types
^^^^^^^^^^^^^^^^^^^
.. code::

databricks clusters list-node-types


.. _alias_databricks_cli:

Aliasing Command Groups
--------------------------
Sometimes it can be inconvenient to prefix each CLI invocation with the name of a command group. Writing
``databricks workspace ls`` can be quite verbose! To make the CLI easier to use, you can alias different
command groups to shorter commands. For example to shorten ``databricks workspace ls`` to ``dw ls`` in the
Bourne again shell, you can add ``alias dw="databricks workspace"`` to the appropriate bash profile. Typically,
this file is located at ``~/.bash_profile``.

.. _jq:

jq
---
Some Databricks CLI commands will output the JSON response from the API endpoint. Sometimes it can be
useful to parse out parts of the JSON to pipe into other commands. For example, to copy a job
definition, we must take the ``settings`` field of ``/api/2.0/jobs/get`` use that as an argument
to the ``databricks jobs create`` command.

In these cases, we recommend you to use the utility ``jq``. MacOS users can install ``jq`` through
Homebrew with ``brew install jq``.

For more information on ``jq`` reference its `documentation <https://stedolan.github.io/jq/>`_.

Using Docker
------------
.. code::

# build image
docker build -t databricks-cli .

# run container
docker run -it databricks-cli

# run command in docker
docker run -it databricks-cli fs --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-cli-0.4.2.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

databricks_cli-0.4.2-py2-none-any.whl (53.0 kB view details)

Uploaded Python 2

File details

Details for the file databricks-cli-0.4.2.tar.gz.

File metadata

File hashes

Hashes for databricks-cli-0.4.2.tar.gz
Algorithm Hash digest
SHA256 ac25b143243ebf1ff790709f96f327dfb5af79225f85e917f94a16790bcb6506
MD5 7fad4481deababe2fcd3744434e5b262
BLAKE2b-256 ec3b1091c692286f294cbe8480a38b8430b830d2882cda0d8a576e7043eb1f62

See more details on using hashes here.

File details

Details for the file databricks_cli-0.4.2-py2-none-any.whl.

File metadata

File hashes

Hashes for databricks_cli-0.4.2-py2-none-any.whl
Algorithm Hash digest
SHA256 ccd3b740eaf1ea3d985e7be6104850453e2a906a59bcdc369a47151e69b7a5a1
MD5 54009749b78fcdd3c64c1b2a690a7925
BLAKE2b-256 c45748910d8231d656b7368f401831cd755f0269bd36a82f1f657a16d7d8b726

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page