Skip to main content

A command line interface for Databricks

Project description

Build Status

The Databricks Command Line Interface (CLI) is an open source tool which provides an easy to use interface to the Databricks platform. The CLI is built on top of the Databricks Rest APIs. Currently, the CLI fully implements the DBFS API and the Workspace API.

PLEASE NOTE, this CLI is under active development and is released as an experimental client. This means that interfaces are still subject to change.

If you’re interested in contributing to the project please reach out. In addition, please leave bug reports as issues on our Github project.

Requirements

  • Python Version > 2.7.9

  • Python 3 is not supported

Installation

To install simply run pip install databricks-cli

In order to upgrade your databricks-cli installation please run pip install --upgrade databricks-cli

Known Issues

AttributeError: 'module' object has no attribute 'PROTOCOL_TLSv1_2'

For compliance reasons, our webapp requires the client to speak TLSV1.2. The built in version of Python for MacOS does not have this version of TLS built in.

To use databricks-cli you should install a version of Python which has ssl.PROTOCOL_TLSv1_2. For MacOS, the easiest way may be to install Python with Homebrew.

Setting Up Authentication

There are two ways to authenticate to Databricks. The first way is to use your username and password pair. To do this run databricks configure and follow the prompts. The second and recommended way is to use an access token generated from Databricks. To configure the CLI to use the access token run databricks configure --token. After following the prompts, your access credentials will be stored in the file ~/.databrickscfg.

Read Token Management for more information about Databricks Access Tokens.

Workspace CLI Examples

The implemented commands for the Workspace CLI can be listed by running databricks workspace -h. Commands are run by appending them to databricks workspace. To make it easier to use the workspace CLI, feel free to alias databricks workspace to something shorter. For more information reference Aliasing Command Groups.

$ databricks workspace -h
Usage: databricks workspace [OPTIONS] COMMAND [ARGS]...

  Utility to interact with the Databricks Workspace. Workspace paths must be
  absolute and be prefixed with `/`.

Options:
  -v, --version
  -h, --help     Show this message and exit.

Commands:
  delete      Deletes objects from the Databricks...
  export      Exports a file from the Databricks workspace...
  export_dir  Recursively exports a directory from the...
  import      Imports a file from local to the Databricks...
  import_dir  Recursively imports a directory from local to...
  list        List objects in the Databricks Workspace
  ls          List objects in the Databricks Workspace
  mkdirs      Make directories in the Databricks Workspace.
  rm          Deletes objects from the Databricks...

Listing Workspace Files

$ databricks workspace ls /Users/example@databricks.com
Usage Logs ETL
Common Utilities
guava-21.0

Importing a local directory of notebooks

The databricks workspace import_dir command will recursively import a directory from the local filesystem to the Databricks workspace. Only directories and files with the extensions of .scala, .py, .sql, .r, .R are imported. When imported, these extensions will be stripped off the name of the notebook.

To overwrite existing notebooks at the target path, the flag -o must be added.

$ tree
.
├── a.py
├── b.scala
├── c.sql
├── d.R
└── e
$ databricks workspace import_dir . /Users/example@databricks.com/example
./a.py -> /Users/example@databricks.com/example/a
./b.scala -> /Users/example@databricks.com/example/b
./c.sql -> /Users/example@databricks.com/example/c
./d.R -> /Users/example@databricks.com/example/d
$ databricks workspace ls /Users/example@databricks.com/example -l
NOTEBOOK   a  PYTHON
NOTEBOOK   b  SCALA
NOTEBOOK   c  SQL
NOTEBOOK   d  R
DIRECTORY  e

Exporting a workspace directory to the local filesystem

Similarly, it is possible to export a directory of notebooks from the Databricks workspace to the local filesystem. To do this, the command is simply

$ databricks workspace export_dir /Users/example@databricks.com/example .

DBFS CLI Examples

The implemented commands for the DBFS CLI can be listed by running databricks fs -h. Commands are run by appending them to databricks fs and all dbfs paths should be prefixed with dbfs:/. To make the command less verbose, we’ve gone ahead and aliased dbfs to databricks fs.

$ databricks fs -h
Usage: databricks fs [OPTIONS] COMMAND [ARGS]...

  Utility to interact with DBFS. DBFS paths are all prefixed
  with dbfs:/. Local paths can be absolute or local.

Options:
  -v, --version
  -h, --help     Show this message and exit.

Commands:
  configure
  cp         Copy files to and from DBFS.
  ls         List files in DBFS.
  mkdirs     Make directories in DBFS.
  mv         Moves a file between two DBFS paths.
  rm         Remove files from dbfs.

Copying a file to DBFS

dbfs cp test.txt dbfs:/test.txt
# Or recursively
dbfs cp -r test-dir dbfs:/test-dir

Copying a file from DBFS

dbfs cp dbfs:/test.txt ./test.txt
# Or recursively
dbfs cp -r dbfs:/test-dir ./test-dir

Aliasing Command Groups

Sometimes it can be inconvenient to prefix each CLI invocation with the name of a command group. Writing databricks workspace ls can be quite verbose! To make the CLI easier to use, you can alias different command groups to shorter commands. For example to shorten databricks workspace ls to dw ls in the Bourne again shell, you can add alias dw="databricks workspace" to the appropriate bash profile. Typically, this file is located at ~/.bash_profile.

Using Docker

# build image
docker build -t docker build -t databricks-cli .

# run container
docker run -it databricks-cli

# run command in docker
docker run -it databricks-cli fs --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-cli-0.3.1.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

databricks_cli-0.3.1-py2-none-any.whl (39.2 kB view details)

Uploaded Python 2

File details

Details for the file databricks-cli-0.3.1.tar.gz.

File metadata

File hashes

Hashes for databricks-cli-0.3.1.tar.gz
Algorithm Hash digest
SHA256 9968384f445f8aca592af302427a35449b6645322ebbbed9234c5c14ca0f3c02
MD5 23622adae971c0f464a53f8568cb23a7
BLAKE2b-256 22878e15aed8554e82186962840a1f18840599b7918b5370a2331ed06d3e4db5

See more details on using hashes here.

File details

Details for the file databricks_cli-0.3.1-py2-none-any.whl.

File metadata

File hashes

Hashes for databricks_cli-0.3.1-py2-none-any.whl
Algorithm Hash digest
SHA256 928e314bcdb10fb60180d7cb2ec8eac17780824dba237f5d00c6a6fb7c32c138
MD5 42d2413defed7ca30f643e484cc32e9a
BLAKE2b-256 51d8fa90bdc18b1adfcb1fbf6a9a7e8febd2753a2d3abc5321d1239192ebff47

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page