Skip to main content

Log first, ask questions later.

Project description

noko

[api reference]

Log first, ask questions later.

noko is a numerical logging library designed for logging as many variables at once on as many timescales as possible.

The prototypical use case for noko is the following:

loss.backward()
noko.log_row("grad_step", locals(), level=noko.TRACE)
optimizer.step()

Which would produce a log file "grad_step.csv", with information on all local variables at every gradient step.

State of the Library

noko is quite useful already, but I'm continuing to refine the API, add new features as I find need of them, and fix bugs as I encounter them.

If you use the library at this time, you should expect API churn, so depend on a specific version.

The current recommended install method is:

pip install "noko[recommended]==0.3.0"

You can also install directly from github:

pip install "git+https://github.com/krzentner/noko.git@v0.3.0#egg=noko[recommended]"

The recommended extra adds packages that can be easily installed and support useful but optional features:

  • GitPython is used for the create_git_checkpoint option and the noko.noko_git.checkpoint_repo() function.

  • pyarrow is used by the noko.arrow_output.ArrowOutputEngine to log to parquet files. Note that this backend is not added to the logger by default.

  • tensorboardX is used by noko.tb_output.TensorBoardOutput to log to tensorboard. noko can also use the tensorboard SummaryWriter from torch.utils.tensorboard and tf.summary. By default this backend is added to the logger, and an attempt is made to use the torch API first.

Key Idea

Most numerical logging libraries expect you to log one key at a time. noko instead expects you to log one "row" at a time, into a named "table". In the above example, the table is named "grad_step", and the row is all local variables.

noko will then go through all values in the row, and determine how to summarize them. For example, it will summarize a pytorch neural network using the min, max, mean, and std of each parameter, as well as those values for the gradient of each parameter (if available).

Once summarized, noko outputs that row to all noko backends that have a log level at least as sensitive as the provided level.

Backends

Always available:

  • newline delimited json (.ndjson)
  • comma separated values (.csv)
  • pprint (usually only used for stdout)

Requires optional dependencies:

  • tensorboard (will use torch.utils.tensorboard or tensorboardX or tf.summary)
  • parquet (requires pyarrow)

Row Summarization

noko runs a preprocessing step on each provided row before logging it. This pre-process is typically lossy, and summarizes large input arrays into a few scalars. This is necessary to be able to log all local variables, and is an unavoidable trade-off of noko's design. If you want to be able to exactly recover a value passed to noko, format it to a str or bytes first, and use the csv, ndjson, or parquet backends.

For built-in datatypes (dictionaries, lists, tuples), preprocessing runs recursively. For commonly used numerical libraries (numpy, pytorch), there are adapter modules to summarize common datatypes.

If you want your type to be flattened, you can define a custom summarizer. The standard way to do this is to use declare_summarizer, which can be used without modifying the source for the particular type being summarized.

Example of summarizing a custom type "MyPoint":

from noko import declare_summarizer, summarize, ScalarTypes

@declare_summarizer(MyPoint):
def summarize_mypoint(point, prefix: str, dst: dict[str, ScalarTypes]):
    # Include x and y fields directly
    dst[f"{prefix}.x"] = point.x
    dst[f"{prefix}.y"] = point.y
    # Recursively summarize metadata
    summarize(point.metadata, f"{prefix}.metadata", dst)

Sometimes you may have local variables that don't make sense to log in noko (e.g. long lists). In that case the recommended pattern is:

row = locals()
del row["my_long_list"]
noko.log_row("my_locals", row)

Extra Utilities

noko has some other useful tricks for reproducibility which can be accessed via calling noko.init_extra().

These include:

  • wandb sync and config logging (if wandb is installed).
  • automatically setting the global seed values for most libraries if present in the config
  • automatically creating a git commit of the current code on a separate branch noko-checkpoints, and adding diffs using that checkpoint to the log directory (requires GitPython to be installed).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

noko-0.3.1.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

noko-0.3.1-py3-none-any.whl (24.1 kB view details)

Uploaded Python 3

File details

Details for the file noko-0.3.1.tar.gz.

File metadata

  • Download URL: noko-0.3.1.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.14

File hashes

Hashes for noko-0.3.1.tar.gz
Algorithm Hash digest
SHA256 315edf2ec3e2b7025bf815143da58395e37cdc3af7b2e60e3babe93e7596ffee
MD5 c9e1c8d5e627541aaa08c7b4eeeae538
BLAKE2b-256 7838d21bb69cbbe1b5fb5e1de8194db08416ccbca1a76814ff46d6ef34d2ab64

See more details on using hashes here.

File details

Details for the file noko-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: noko-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 24.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.14

File hashes

Hashes for noko-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 95846046e25b0b9543f73895c64bfdf6c56bfe93f22acadee8008300192e439f
MD5 71c04f5d26da65c1e3a87eb9aa0f5404
BLAKE2b-256 1606e9412477a68196dc9a8ff1f585b54e8833e9261c62d561ecff6ace57b06c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page