Skip to main content

🔌 Open-source plugins for with practical features for Argilla using listeners.

Project description

Argilla Plugins

🔌 Open-source plugins for extra features and workflows

Why? The design of Argilla is intentionally programmable (i.e., developers can build complex workflows for reading and updating datasets). However, there are certain workflows and features which are shared across different use cases and could be simplified from a developer experience perspective. In order to facilitate the reuse of key workflows and empower the community, Argilla Plugins provides a collection of extensions to super power your Argilla use cases. Some of this pluggable method could be eventually integrated into the core of Argilla.

Quickstart

pip install argilla-plugins
from argilla_plugins.datasets import end_of_life

plugin = end_of_life(
    name="plugin-test",
    end_of_life_in_seconds=100,
    execution_interval_in_seconds=5,
    discard_only=False
)
plugin.start()

How to develop a plugin

  1. Pick a cool plugin from the list of topics or our issue overview.
  2. Think about an abstraction for the plugin as shown below.
  3. Refer to the solution in the issue.
    1. fork the repo.
    2. commit your code
    3. open a PR.
  4. Keep it simple.
  5. Have fun.

Development requirements

Function

We want to to keep the plugins as abstract as possible, hence they have to be able to be used within 3 lines of code.

from argilla_plugins.topic import plugin
plugin(name="dataset_name", ws="workspace" query="query", interval=1.0)
plugin.start()

Variables

variables name, ws, and query are supposed to be re-used as much as possible throughout all plugins. Similarly, some functions might contain adaptations like name_from or query_from. Whenever possible re-use variables as much as possible.

Ohh, and don`t forget to have fun! 🤓

Topics

Reporting

What is it? Create interactive reports about dataset activity, dataset features, annotation tasks, model predictions, and more.

Plugins:

  • automated reporting pluging using datapane. issue
  • automated reporting pluging for great-expectations. issue

Datasets

What is it? Everything that involves operations on a dataset level, like dividing work, syncing datasets, and deduplicating records.

Plugins:

  • sync data between datasets.
    • directional A->B. issue
    • bi-directional A <-> B. issue
  • remove duplicate records. issue
  • create train test splits. issue
  • set limits to records in datasets

End of Life

Automatically delete or discard records after x seconds.

from argilla_plugins.datasets import end_of_life

plugin = end_of_life(
    name="plugin-test",
    end_of_life_in_seconds=100,
    execution_interval_in_seconds=5,
    discard_only=False
)
plugin.start()

Programmatic Labelling

What is it? Automatically update annotations and predictions labels and predictions of records based on heuristics.

Plugins:

  • annotated spans as gazzetteer for labelling. issue
  • vector search queries and similarity threshold. issue
  • use gazzetteer for labelling. issue
  • materialize annotations/predictions from rules using Snorkel or a MajorityVoter issue

Token Copycat

If we annotate spans for texts like NER, we are relatively certain that these spans should be annotated the same throughout the entire dataset. We could use this assumption to already start annotating or predicting previously unseen data.

from argilla_plugins import token_copycat

plugin = token_copycat(
    name="plugin-test",
    query=None,
    copy_predictions=True,
    word_dict_kb_predictions={"key": {"label": "label", "score": 0}},
    copy_annotations=True,
    word_dict_kb_annotations={"key": {"label": "label", "score": 0}},
    included_labels=["label"],
    case_sensitive=True,
    execution_interval_in_seconds=1,
)
plugin.start()

Active learning

What is it? A process during which a learning algorithm can interactively query a user (or some other information source) to label new data points.

Plugins:

  • active learning for TextClassification.
  • active learning for TokenClassification. issue
from argilla_plugins import classy_learner

plugin = classy_learner(
    name="plugin-test",
    query=None,
    model="all-MiniLM-L6-v2",
    classy_config=None,
    certainty_threshold=0,
    overwrite_predictions=True,
    sample_strategy="fifo",
    min_n_samples=6,
    max_n_samples=20,
    batch_size=1000,
    execution_interval_in_seconds=5,
)
plugin.start()

Inference endpoints

What is it? Automatically add predictions to records as they are logged into Argilla. This can be used for making it really easy to pre-annotated a dataset with an existing model or service.

  • inference with un-authenticated endpoint. issue
  • embed incoming records in the background. issue

Training endpoints

What is it? Automatically train a model based on dataset annotations.

  • TBD

Suggestions

Do you have any suggestions? Please open an issue 🤓

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

argilla-plugins-0.1.1.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

argilla_plugins-0.1.1-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file argilla-plugins-0.1.1.tar.gz.

File metadata

  • Download URL: argilla-plugins-0.1.1.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.10.9 Darwin/22.2.0

File hashes

Hashes for argilla-plugins-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1a0fcf58cf69320712fc3c4d6f537ae10f506472f83b9fbaa73b5fe0e12c2e4f
MD5 784c71f59710b781d6b5adeaed3671c8
BLAKE2b-256 aa5443a1cad42f01f653eb80e500f2c400f9b337d6dd061ee0d95a9efa63cc21

See more details on using hashes here.

File details

Details for the file argilla_plugins-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: argilla_plugins-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.10.9 Darwin/22.2.0

File hashes

Hashes for argilla_plugins-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fe8e37b8d9c91170598162bfe4065923b6d7c1898cb9ef632babdba1f349688e
MD5 c5b0b4bbfc1df174f8930ad6339bdbe1
BLAKE2b-256 804d29991257e58e0c0a4879a26ae1d90e083e8a91d5fd340fb02a38540c7bb1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page