Skip to main content

Run code on a dask worker via a context manager

Project description

Afar

Python Version Version License Build Status Coverage Status Code style

One man's magic is another man's engineering
Robert A. Heinlein


To install: pip install afar

afar allows you to run code on a remote Dask cluster using context managers and IPython magics. For example:

import afar
from dask.distributed import Client
client = Client()

with afar.run, remotely:
    import dask_cudf
    df = dask_cudf.read_parquet("s3://...")
    result = df.sum().compute()

Outside the context, result is a Dask Future whose data resides on a worker. result.result() is necessary to copy the data locally.

By default, only the last assignment is saved. One can specify which variables to save:

with afar.run("one", "two"), remotely:
    one = 1
    two = one + 1

one and two are now both Futures. They can be used directly in other afar.run contexts:

with afar.run as data, remotely:
    three = one + two

assert three.result() == 3
assert data["three"].result() == 3

data above is a dictionary of variable names to Futures. It may be necessary at times to get the data from here. Alternatively, you may pass a mapping to afar.run to use as the data.

run = afar.run(data={"four": 4})
with run, remotely:
    seven = three + four
assert run.data["seven"].result() == 7

If you want to automatically gather the data locally (to avoid calling .result()), use afar.get instead of afar.run:

with afar.get, remotely:
    five = two + three
assert five == 5

Interactivity in Jupyter

There are several enhancements when using afar in Jupyter Notebook or Qt console, JupyterLab, or any IPython-based frontend that supports rich display.

The rich repr of the final expression will be displayed if it's not an assignment:

with afar.run, remotely:
    three + seven
# displays 10!

Printing is captured and displayed locally:

with afar.run, remotely:
    print(three)
    print(seven, file=sys.stderr)
# 3
# 7

These are done asynchronously using ipywidgets.

Magic!

First load afar magic extension:

%load_ext afar

Now you can use afar as line or cell magic. %%afar is like with afar.run, remotely:. It can optionally accept a list of variable names to save:

%%afar x, y
x = 1
y = x + 1

and

z = %afar x + y

Is this a good idea?

I don't know, but it sure is a joy to use 😃 !

For motivation, see https://github.com/dask/distributed/issues/4003

It's natural to be skeptical of unconventional syntax. Often times, I don't think it's obvious whether new syntax will be nice to use, and you really just need to try it out and see.

We're still exploring the usability of afar. If you try it out, please share what you think, and ask yourself questions such as:

  • can we spell anything better?
  • does this offer opportunities?
  • what is surprising?
  • what is lacking?

Here's an example of an opportunity:

on_gpus = afar.remotely(resources={"GPU": 1})

with afar.run, on_gpus:
    ...

This now works! Keyword arguments to remotely will be passed to client.submit.

I don't know about you, but I think this is starting to look and feel kinda nice, and it could probably be even better :)

Caveats and Gotchas

Repeatedly copying data

afar automatically gets the data it needs--and only the data it needs--from the outer scope and sends it to the Dask cluster to compute on. Since we don't know whether local data has been modified between calls to afar, we serialize and send local variables every time we use run or get. This is generally fine: it works, it's safe, and is usually fast enough. However, if you do this frequently with large-ish data, the performance could suffer, and you may be using more memory on your local machine than necessary.

With Dask, a common pattern is to send data to the cluster with scatter and get a Future back. This works:

A = np.arange(10**7)
A = client.scatter(A)
with afar.run, remotely:
    B = A + 1
# A and B are now both Futures; their data is on the cluster

Another option is to pass data to run:

run = afar.run(data={"A": np.arange(10**7)})
with afar.run, remotely:
    B = A + 1
# run.data["A"] and B are now both Futures; their data is on the cluster

Here's a nifty trick to use if you're in an IPython notebook: use data=globals()!

run = afar.run(data=globals())
A = np.arange(10**7)
with afar.run, remotely:
    B = A + 1
# A and B are now both Futures; their data is on the cluster

Mutating remote data

As with any Dask workload, one should be careful to not modify remote data that may be reused.

Mutating local data

Similarly, code run remotely isn't able to mutate local variables. For example:

d = {}
with afar.run, remotely:
    d['key'] = 'value'
# d == {}

✨ This code is highly experimental and magical! ✨

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

afar-0.5.0.tar.gz (34.8 kB view details)

Uploaded Source

Built Distribution

afar-0.5.0-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file afar-0.5.0.tar.gz.

File metadata

  • Download URL: afar-0.5.0.tar.gz
  • Upload date:
  • Size: 34.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.1 requests/2.26.0 setuptools/49.6.0.post20210108 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6

File hashes

Hashes for afar-0.5.0.tar.gz
Algorithm Hash digest
SHA256 2cb966e2f72384ad7077d017cfe64504bb20ab89b17acddefcfb13f398cb938b
MD5 4cb2562c4872c5c797008632f830603e
BLAKE2b-256 5bc57b28635bcabc8e2e1f10dabd799eba36caa62cf895b791b7ea51c80e3001

See more details on using hashes here.

File details

Details for the file afar-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: afar-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.1 requests/2.26.0 setuptools/49.6.0.post20210108 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6

File hashes

Hashes for afar-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1cc3fb973bc026bb20263631353285196461b997d56478737fecdd2634920d07
MD5 53818c7b360e8eae28c889d51c3ccd0e
BLAKE2b-256 8d4ab45d7bef60bdd082130998875d8d26d789e7976b4b92c23bdd86c4f50552

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page