Skip to main content

Few-Shot Named Entity Recognition using Span Markers

Project description

SpanMarker for Named Entity Recognition

SpanMarker is a framework for training powerful Named Entity Recognition models using familiar encoders such as BERT, RoBERTa and DeBERTa. Tightly implemented on top of the 🤗 Transformers library, SpanMarker can take advantage of its valuable functionality.

Based on the PL-Marker paper, SpanMarker breaks the mold through its accessibility and ease of use. Crucially, SpanMarker works out of the box with many common encoders such as bert-base-cased and roberta-large, and automatically works with datasets using the IOB, IOB2, BIOES, BILOU or no label annotation scheme.

Documentation

Feel free to have a look at the documentation.

Installation

You may install the span_marker Python module via pip like so:

pip install span_marker

Quick Start

Please have a look at our Getting Started notebook for details on how SpanMarker is commonly used. It explains the following snippet in more detail.

Colab Kaggle Gradient Studio Lab
Open In Colab Kaggle Gradient Open In SageMaker Studio Lab
from datasets import load_dataset
from span_marker import SpanMarkerModel, Trainer
from transformers import TrainingArguments

def main():
    dataset = load_dataset("DFKI-SLT/few-nerd", "supervised")
    labels = dataset["train"].features["ner_tags"].feature.names

    model_name = "bert-base-cased"
    model = SpanMarkerModel.from_pretrained(model_name, labels=labels)

    args = TrainingArguments(
        output_dir="my_span_marker_model",
        learning_rate=5e-5,
        gradient_accumulation_steps=2,
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        num_train_epochs=1,
        save_strategy="steps",
        eval_steps=200,
        logging_steps=50,
        fp16=True,
        warmup_ratio=0.1,
        dataloader_num_workers=2,
    )

    trainer = Trainer(
        model=model,
        args=args,
        train_dataset=dataset["train"].select(range(8000)),
        eval_dataset=dataset["validation"].select(range(2000)),
    )

    trainer.train()
    trainer.save_model("my_span_marker_model/checkpoint-final")

    metrics = trainer.evaluate()
    print(metrics)

if __name__ == "__main__":
    main()

Pretrained Models

Context

Argilla

I have developed this library as a part of my thesis work at Argilla. Feel free to ⭐ star or watch the SpanMarker repository to get notified when my thesis is published.

Changelog

See CHANGELOG.md for news on all SpanMarker versions.

License

See LICENSE for the current license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

span_marker-1.1.0.tar.gz (37.4 kB view details)

Uploaded Source

Built Distribution

span_marker-1.1.0-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file span_marker-1.1.0.tar.gz.

File metadata

  • Download URL: span_marker-1.1.0.tar.gz
  • Upload date:
  • Size: 37.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for span_marker-1.1.0.tar.gz
Algorithm Hash digest
SHA256 f921cd58fb8bece1d6ddaa916a5493a3839bc50ae159d256c26382579cf1f967
MD5 74bbb70ea81188454518ef298af47223
BLAKE2b-256 43e231c157cb4db431c26c6d72a3e3af98a7f9f4f6773dc099d0d452a99a9725

See more details on using hashes here.

File details

Details for the file span_marker-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: span_marker-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for span_marker-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a1e4f127b4ee07efe1904b9f9916c5c6db4c9b0f6a648d77316e2d82d62542ff
MD5 8cc1b70b7b93039a903764ad11d98b24
BLAKE2b-256 a8cf8d0ace0c1b462309e09ebbca7ba7ff3af0f59355c7dd29f1d480715c842f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page