Skip to main content

A lightweight Python package for taking notes on your machine learning experiments

Project description

hypernotes

PyPI version Python versions

hypernotes is a lightweight Python package for taking notes on your machine learning experiments. It provides a simple way to store hyperparameters, their corresponding evaluation metrics, as well as additional information and retrieve them again later for analyzing. It is written in pure Python and requires no additional dependencies.

Installation

pip install hypernotes

Only Python 3.6+ is supported

Basic Usage

hypernotes implements a Note and a Store class. A Note is a small wrapper around Python dictionaries. This means that you can do everything with it, that you could do with a normal dictionary, but in addition, it stores:

  • the path to your Python executable,
  • information about the current state of your Git repository (if there is one) such as the last commit, current branch, etc.,
  • start (upon initialization) and end datetime (call note.end() or add to store)

and it provides:

  • a useful default dictionary structure
  • access to all initial dictionary keys as attributes for better auto-completion support and readability (for example note.parameters, note.features)

If you print a note, you can see what's inside. A note right after initialization looks like this:

Note(content={'text': '',
 'model': None,
 'parameters': {},
 'features': {'identifier': [],
              'binary': [],
              'categorical': [],
              'numerical': []},
 'target': None,
 'metrics': {},
 'info': {},
 'start_datetime': datetime.datetime(2019, 5, 21, 11, 3, 20),
 'end_datetime': None,
 'identifier': '3228fe02-d1c8-4251-8b35-bb8ae3d5f227',
 'python_path': 'C:/example_path/python.exe',
 'git': {'repo_name': 'C:/path_to_your_repo',
         'branch': 'master',
         'commit': '6bbdf31'}}

The notes are then saved with a Store instance, which uses a json file. Due to this, you should only add json-serializable objects + datetime.datetime instances to a Note.

A note is uniquely identifiable by its identifier attribute.

Create a note and add to a store

from hypernotes import Note, Store

note = Note("Some descriptive text about your experiment")

# Add name of used algorithm
note.model = "randomforest"

# Add hyperparameters about model training, preprocessing, etc.
note.parameters["num_estimators"] = 100
note.parameters["impute_missings"] = True

# Add the names of the features and of the target variable
note.features["identifier"] = ["id"]
note.features["binary"] = ["bool1"]
note.features["categorical"] = ["cat1", "cat2"]
note.features["numerical"] = ["num1"]
note.target = "target"

# Some additional information
note.info["important_stuff"] = "something noteworthy"

# ... Rest of your code ...
# train_recall, train_precision test_recall, test_precision = train_and_evaluate_model(
#                                              parameters=note.params,
#                                              feature_names=note.features,
#                                              target_name=note.target)
# ...

# Add your calculated evaluation metrics
note.metrics["train"] = {"recall": train_recall, "precision": train_precision}
note.metrics["test"] = {"recall": test_recall, "precision": test_precision}

store = Store("hyperstore.json")
store.add(note)

Load notes

A Store instance provides the load method, which can be used to retrieve the whole store. By default it returns a sorted list (most recent note first).

notes = store.load()
most_recent_note = notes[0]

If you have pandas installed, you can use the return_dataframe argument to return a pandas dataframe.

notes_df = store.load(return_dataframe=True)
notes_df.head()

Example of a returned pandas dataframe:

start_datetime end_datetime text model metrics.test.precision metrics.test.recall metrics.train.precision metrics.train.recall parameters.min_sample_split parameters.num_estimators parameters.sample_weight features.binary features.categorical features.identifier features.numerical target git.branch git.commit git.repo_name identifier info.important_stuff python_path
0 2019-05-21 16:44:48 2019-05-21 17:05:21 Another useful description randomforest 0.29 0.29 0.40 0.50 7 150 None [bool1] [cat1, cat2] [id] [num1] target master 5e098ab C:/path_to_your_repo 0f84217d-e01b-466d-9a73-001827c60584 something noteworthy C:/example_path/python.exe
1 2019-05-21 16:12:53 2019-05-21 16:30:16 Useful description randomforest 0.82 0.29 0.91 0.98 7 100 balanced [bool1] [cat1, cat2] [id] [num1] target master 5e098ab C:/path_to_your_repo dd8bbc32-ff8f-433d-9eec-a24a7859622f something noteworthy C:/example_path/python.exe

Update notes

If you want to update notes, you can do this either directly in the json file containing the notes, or load the notes as described above, change the relevant ones, and pass them to the update method.

notes = store.load()
updated_notes = []
for note in notes[:2]:
    note.info["something_new"] = "..."
    updated_notes.append(note)

store.update(updated_notes)

Remove notes

If you want to remove notes, you can do this either directly in the json file containing the notes, or load the notes as described above, and pass the ones which you want to remove to the remove method.

notes = store.load()
notes_to_remove = notes[:2]
store.remove(notes_to_remove)

Create note from another one

When evaluating multiple model parameters (e.g. in a grid search setup), you might find it useful to create a new note for each parameter set. To do this, you can use the from_note method to create a new note from an existing one. This takes over all existing content, but also sets a new start datetime and identifier. After creation, the notes are independent, i.e. modifying one will not affect the other.

original_note = Note("Original")
new_note = Note.from_note(original_note)

View content of a store

Directly in your browser (no additional dependencies)

To get a quick glance into a store, you can use the package from the command line. It will start an http server and automatically open the relevant page in your web browser. The page contains an interactive table which shows the most relevant information of all notes in the store such as metrics and parameters. The table is similar in style to the one shown in the Load notes section.

$ python -m hypernotes hyperstore.json

This only requires a modern web browser as well as an internet connection to load a view javascript libraries and css files.

To see all available options pass the --help argument.

pandas and QGrid

Another useful option might be to load the store as a pandas dataframe (see Load notes) and then use Qgrid in a Jupyter notebook.

Bonus: Store additional objects in separate experiment folders

If you want to store larger artifacts of your experiment, such as a trained model, you could create a separate folder and use the identifier of a note as part of the name.

experiment_folder = f"experiment_{note.identifier}"

You can then store any additional objects into this folder and it will be very easy to lather on link them again to the hyperparameters and metrics stored using hypernotes.

Other tools

Check out tools such as MLflow, Sacred, or DVC if you need better multi-user capabilities, more advanced reproducibility features, dataset versioning, ...

Development

Feel free to open a GitHub issue or even better submit a pull request if you find a bug or miss a feature.

Any requirements for developing the package can be installed with

pip install -r requirements_dev.txt

Code is required to be formatted with Black.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypernotes-2.0.0.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

hypernotes-2.0.0-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file hypernotes-2.0.0.tar.gz.

File metadata

  • Download URL: hypernotes-2.0.0.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for hypernotes-2.0.0.tar.gz
Algorithm Hash digest
SHA256 e13c1fedbc4d5241950b9731775f1e6f105b4d6dbd180620230aad6e73cca0ef
MD5 9d637b1b2d20185bdf9dd4088c30bfa5
BLAKE2b-256 fdf1f61afaaeafd9a196e9a023c895bf16b9502678401384cb22cfd0271189e9

See more details on using hashes here.

Provenance

File details

Details for the file hypernotes-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: hypernotes-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for hypernotes-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3b6ffe74cb896227cae8bd25404420537591ff9d7a68a8352d2eae5af11a38e1
MD5 3efc5ff05eb7b1e73c0f1ec144d69960
BLAKE2b-256 2485a01d9531d9ec80f5a130c70dc2be6455296de37fe092234c1e6ba9a27aed

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page