Skip to main content

Jupyter Notebook extension for Apache Spark integration

Project description

# jupyter-spark

[![Build Status](https://travis-ci.org/mozilla/jupyter-spark.svg?branch=master)](https://travis-ci.org/mozilla/jupyter-spark)

[![codecov](https://codecov.io/gh/mozilla/jupyter-spark/branch/master/graph/badge.svg)](https://codecov.io/gh/mozilla/jupyter-spark)

Jupyter Notebook extension for Apache Spark integration.

Includes a progress indicator for the current Notebook cell if it invokes a Spark job. Queries the Spark UI service on the backend to get the required Spark job information.

![Alt text](/screenshots/ProgressBar.png?raw=true “Spark progress bar”)

To view all currently running jobs, click the “show running Spark jobs” button, or press `Alt+S`.

![Alt text](/screenshots/SparkButton.png?raw=true “show running Spark jobs button”)

![Alt text](/screenshots/Dialog.png?raw=true “Spark dialog”)

A proxied version of the Spark UI can be accessed at http://localhost:8888/spark.

## Installation

To install, simply run:

` pip install jupyter-spark jupyter serverextension enable --py jupyter_spark jupyter nbextension install --py jupyter_spark jupyter nbextension enable --py jupyter_spark `

You may also have to enable the widgetsnbextension extension if it hasn’t been enabled before (check by running jupyter nbextension list):

` jupyter nbextension enable --py widgetsnbextension `

To double-check if the extension was correctly installed run:

` jupyter nbextension list jupyter serverextension list `

Pleaes feel free to install [lxml](http://lxml.de/) as well to improve performance of the server side communication to Spark using your favorite package manager, e.g.:

` pip install lxml `

For development and testing, clone the project and run from a shell in the project’s root directory:

` pip install -e . jupyter serverextension enable --py jupyter_spark jupyter nbextension install --py jupyter_spark jupyter nbextension enable --py jupyter_spark `

To uninstall the extension run:

` jupyter serverextension disable --py jupyter_spark jupyter nbextension disable --py jupyter_spark jupyter nbextension uninstall --py jupyter_spark pip uninstall jupyter-spark `

## Configuration

To change the URL of the Spark API that the job metadata is fetched from override the Spark.url config value, e.g. on the command line:

` jupyter notebook --Spark.url="http://localhost:4040" `

## Changelog

### 0.3.0 (2016-07-04)

  • Rewrote proxy to use an async Tornado handler and HTTP client to fetch responses from Spark.

  • Simplified proxy processing to take Amazon EMR proxying into account

  • Extended test suite to cover proxy handler, too.

  • Removed requests as a dependency.

### 0.2.0 (2016-06-30)

  • Refactored to fix a bunch of Python packaging and code quality issues

  • Added test suite for Python code

  • Set up continuous integration: https://travis-ci.org/mozilla/jupyter-spark

  • Set up code coverage reports: https://codecov.io/gh/mozilla/jupyter-spark

  • Added ability to override Spark API URL via command line option

  • IMPORTANT Requires manual step to enable after running pip install (see installation docs)!

    To update:

    1. Run pip uninstall jupyter-spark

    2. Delete spark.js from your nbextensions folder.

    3. Delete any references to jupyter_spark.spark in jupyter_notebook_config.json (in your .jupyter directory)

    4. Delete any references to spark in notebook.json (in .jupyter/nbconfig)

    5. Follow installation instructions to reinstall

### 0.1.1 (2016-05-03)

  • Initial release with a working prototype

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jupyter-spark-0.3.0.tar.gz (84.3 kB view details)

Uploaded Source

Built Distribution

jupyter_spark-0.3.0-py2.py3-none-any.whl (7.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file jupyter-spark-0.3.0.tar.gz.

File metadata

File hashes

Hashes for jupyter-spark-0.3.0.tar.gz
Algorithm Hash digest
SHA256 adfd72fd468df45a46bea42abe3d217d34f37a94fa69e8ea3acd54c325604eb0
MD5 f9230ecb7d62cf4688a68b53555ab0e5
BLAKE2b-256 698296a32488cd2b2357df291e0348bb7cba106b7bda2e74d3054c540d5e362b

See more details on using hashes here.

Provenance

File details

Details for the file jupyter_spark-0.3.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for jupyter_spark-0.3.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 bf629926909d4c57707b73b4570eaf8d32b1a2a0d553a93572e474e9aee8a0de
MD5 860984a39e4c956c8544987a79e1ca49
BLAKE2b-256 3e1fcc7d2c0bfc92cb66ae0a490694563c6e6f0056f0d45c93fcd73ebff01d89

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page