A library for recording and reading data in Jupyter and nteract Notebooks
Project description
scrapbook
scrapbook is a library for recording a notebook’s data values (scraps) and generated visual content (snaps). These recorded scraps and snaps can be read at a future time.
Two new names for information are introduced in scrapbook:
- scraps: serializable data values such as strings, lists of objects, pandas dataframes, or data table references.
- snaps: named displays of information such as a generated image, plot, or UI message which encapsulate information but do not store the underlying data.
Use Case
Notebook users may wish to record data produced during a notebook execution. This recorded data can then be read to be used at a later time or be passed to another notebook as input.
Namely scrapbook lets you:
- persist data (scraps) in a notebook
- sketch named displays (snaps) in notebooks
- recall any persisted scrap of data or displayed snap
- summarize collections of notebooks
API Calls
Scrapbook adds a few basic api commands which enable saving and retrieving data.
glue
to persist scraps
Records a scrap
(data value) in the given notebook cell.
The scrap
(recorded value) can be retrieved during later inspection of the
output notebook.
sb.glue("hello", "world")
sb.glue("number", 123)
sb.glue("some_list", [1, 3, 5])
sb.glue("some_dict", {"a": 1, "b": 2})
sb.glue("non_json", df, 'arrow')
The scrapbook library can be used later to recover scraps (recorded values) from the output notebook:
nb = sb.read_notebook('notebook.ipynb')
nb.scraps
scrapbook will imply the storage format by the value type of any registered
data translators. Alternatively, the implied storage format can be overwritten by
setting the storage
argument to the registered name (e.g. "json"
) of a
particular translator.
This data is persisted by generating a display output with a special media type identifying the content storage format and data. These outputs are not visible in notebook rendering but still exist in the document. Scrapbook then can rehydrate the data associated with the notebook in the future by reading these cell outputs.
sketch
to save display output
Display a named snap (visible display output) in a retrievable manner.
Unlike glue
, sketch
is intended to generate a visible display output
for notebook interfaces to render.
# record an image highlight
sb.sketch("sharable_png", IPython.display.Image(filename=get_fixture_path("sharable.png")))
# record a UI message highlight
sb.sketch("hello", "Hello World")
Like scraps, these can be retrieved at a later time. Unlike scraps, highlights do not carry any actual underlying data, keeping just the display result of some object.
nb = sb.read_notebook('notebook.ipynb')
# Returns the dict of name -> snap pairs saved in `nb`
nb.snaps
More usefully, you can copy snaps from earlier notebook executions to re-display the object in the current notebook.
nb = sb.read_notebook('notebook.ipynb')
nb.copy_highlight("sharable_png")
read_notebook
reads one notebook
Reads a Notebook object loaded from the location specified at path
.
You've already seen how this function is used in the above api call examples,
but essentially this provides a thin wrapper over an nbformat
notebook object
with the ability to extract scrapbook scraps and snaps.
nb = sb.read_notebook('notebook.ipynb')
The abstraction makes saved content available as a dataframe referencing each key and source. More of these methods will be made available in later versions.
# Produces a data frame with ["name", "value", "type", "filename"] as columns
nb.scrap_dataframe
The Notebook object also has a few legacy functions for backwards compatability with papermill's Notebook object model. As a result, it can be used to read papermill execution statistics as well as scrapbook abstractions:
nb.cell_timing # List of cell execution timings in cell order
nb.execution_counts # List of cell execution counts in cell order
nb.papermill_metrics # Dataframe of cell execution counts and times
nb.parameter_dataframe # Dataframe of notebook parameters
nb.papermill_dataframe # Dataframe of notebook parameters and cell scraps
The notebook reader relies on papermill's registered iorw to enable access to a variety of sources such as -- but not limited to -- S3, Azure, and Google Cloud.
read_notebooks
reads many notebooks
Reads all notebooks located in a given path
into a Scrapbook object.
# create a scrapbook named `book`
book = sb.read_notebooks('path/to/notebook/collection/')
# get the underlying notebooks as a list
book.sorted_notebooks
The Scrapbook (book
in this example) can be used to recall all scraps across
the collection of notebooks:
book.scraps # Map of {notebook -> {name -> scrap}}
book.flat_scraps # Map of {name -> scrap}
Or to collect snaps:
book.snaps # Map of {notebook -> {name -> snap}}
book.flat_highlights # Map of {name -> snap}
The Scrapbook collection can be used to display
all the snaps from the
collection as a markdown structured output as well.
book.display()
This display can filter on snap names and keys, as well as enable or disable an overall header for the display.
Finally the scrapbook has two backwards compatible features for deprecated
papermill
capabilities:
book.papermill_dataframe
book.papermill_metrics
These function also relies on papermill's registered iorw
to list and read files form various sources.
Storage Formats
Storage formats are accessible by key names to Translator objects registered
against the translators.registry
object. To register new data
translator / loaders simply call:
# add translator to the registry
registry.register("custom_store_name", MyCustomTranslator())
The store class must implement two methods, translate
and load
:
class MyCustomTranslator(object):
def translate(self, scrap):
pass # TODO: Implement
def load(self, scrap):
pass # TODO: Implement
This can read transform scraps into a string representing their contents or location and load those strings back into the original data objects.
unicode
A basic string storage format that saves data as python strings.
sb.glue("hello", "world", "unicode")
json
sb.glue("foo_json", {"foo": "bar", "baz": 1}, "json")
arrow
Implementation Pending!
papermill's deprecated record
feature
scrapbook provides a robust and flexible recording schema. This library is
intended to replace papermill's existing
record
functionality.
Documentation for papermill record In brief:
pm.record(name, value)
: enabled users the ability to record values to be saved
with the notebook [API documentation]
pm.record("hello", "world")
pm.record("number", 123)
pm.record("some_list", [1, 3, 5])
pm.record("some_dict", {"a": 1, "b": 2})
pm.read_notebook(notebook)
: pandas could be used later to recover recorded
values by reading the output notebook into a dataframe.
nb = pm.read_notebook('notebook.ipynb')
nb.dataframe
Limitations and challenges
- The
record
function didn't follow papermill's pattern of linear execution of a notebook codebase. (It was awkward to describerecord
as an additional feature of papermill this week. It really felt like describing a second less developed library.) - Recording / Reading required data translation to JSON for everything. This is a tedious, painful process for dataframes.
- Reading recorded values into a dataframe would result in unintuitive dataframe shapes.
- Less modularity and flexiblity than other papermill components where custom operators can be registered.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scrapbook-beta-0.1.0.tar.gz
.
File metadata
- Download URL: scrapbook-beta-0.1.0.tar.gz
- Upload date:
- Size: 35.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11768595e437483c5f71b2abc3bbfb903fb60e6bf243d30927f81fe09c072905 |
|
MD5 | b13b8f9624c28682336b822cc2cb0f94 |
|
BLAKE2b-256 | e78feaf6b9b525943a4598e05029ac4a1abfa3b4ef0b915fdde30f1a573478bd |
File details
Details for the file scrapbook_beta-0.1.0-py2.py3-none-any.whl
.
File metadata
- Download URL: scrapbook_beta-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ac0950680981a1017cd030099a9d76548a1a8e3b150ac417094fbe0ecda7fa1 |
|
MD5 | 8be17e9b7d90de852b100b631c1efbfe |
|
BLAKE2b-256 | aea31e9212e7660654d1d29372342574fdd821d60f0ad1ef2214754f01c11b28 |