Skip to main content

Synbols: Probing Learning Algorithms with Synthetic Datasets

Project description

#Synbols

Probing Learning Algorithms with Synthetic Datasets

License CircleCI Documentation Status

Synbols

Progress in the field of machine learning has been fueled by the introduction of benchmark datasets pushing the limits of existing algorithms. Enabling the design of datasets to test specific properties and failure modes of learning algorithms is thus a problem of high interest, as it has a direct impact on innovation in the field. In this sense, we introduce Synbols — Synthetic Symbols — a tool for rapidly generating new datasets with a rich composition of latent features rendered in low resolution images. Synbols leverages the large amount of symbols available in the Unicode standard and the wide range of artistic font provided by the open font community. Our tool's high-level interface provides a language for rapidly generating new distributions on the latent features, including various types of textures and occlusions. To showcase the versatility of Synbols, we use it to dissect the limitations and flaws in standard learning algorithms in various learning setups including supervised learning, active learning, out of distribution generalization, unsupervised representation learning, and object counting.

[paper]

Installation

The easiest way to install Synbols is via PyPI. Simply run the following command:

pip install synbols

Software dependencies

Synbols relies on fonts and system packages. To ensure reproducibility, we provide a Docker image with everything preinstalled. Thus, the only dependency is Docker (see here to install).

Usage

Using predefined generators

$ synbols-datasets --help
$ synbols-datasets --dataset=some-large-occlusion --n_samples=1000 --seed=42

Generating some-large-occlusion dataset. Info: With probability 20%, add a large occlusion over the existing symbol.
Preview generated.
 35%|############################2                                                   | 353/1000 [00:05<00:10, 63.38it/s]

Defining your own generator

Examples of how to create new datasets can be found in the examples directory.

def translation(rng):
    """Generates translations uniformly from (-2, 2), going outside of the box."""
    return tuple(rng.uniform(low=-2, high=2, size=2))


# Modifies the default attribute sampler to fix the scale to a constant and the (x,y) translation to a new distribution
attr_sampler = basic_attribute_sampler(scale=0.5, translation=translation)

generate_and_write_dataset(dataset_path, attr_sampler, n_samples)

To generate your dataset, you need to run your code in the Synbols runtime environment. This is done using the synbols command as follows:

synbols mydataset.py --foo bar

Launch the example notebook

We provide an example Jupyter notebook in the examples directory. To run this notebook, first download it locally and run the following command at the notebook's location:

synbols-jupyter

This will launch jupyter notebook in the Synbols runtime environment and allow you to access it via your browser.

Contact

For any bug or feature requests, please create an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synbols-1.0.0.tar.gz (31.2 kB view details)

Uploaded Source

Built Distribution

synbols-1.0.0-py3-none-any.whl (34.0 kB view details)

Uploaded Python 3

File details

Details for the file synbols-1.0.0.tar.gz.

File metadata

  • Download URL: synbols-1.0.0.tar.gz
  • Upload date:
  • Size: 31.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.6

File hashes

Hashes for synbols-1.0.0.tar.gz
Algorithm Hash digest
SHA256 78fcdbf37383ff0f862f0ce2d00392f12337c915c759e5e65f6ef186917f6341
MD5 f472ede45ae7abd64d4fa7cdb0e3b3d6
BLAKE2b-256 a580549b5765971f4f956c71d98d881241f20ef1691c5e4cf8b546e0eb71deee

See more details on using hashes here.

File details

Details for the file synbols-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: synbols-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 34.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.6

File hashes

Hashes for synbols-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b0f98f643b74b360afdbc63c8bd92e4189ca9e1011508112fc8b4327090645e2
MD5 841c68b75f786757a5f2d63a07ff9c35
BLAKE2b-256 0807e7c5eaa193afe62d4aed35f127a7c1c99beda8f8610964db11c724e57431

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page