Skip to main content

Few-Shot Named Entity Recognition using Span Markers

Project description

SpanMarker for Named Entity Recognition

SpanMarker is a framework for training powerful Named Entity Recognition models using familiar encoders such as BERT, RoBERTa and DeBERTa. Tightly implemented on top of the 🤗 Transformers library, SpanMarker can take advantage of its valuable functionality.

Based on the PL-Marker paper, SpanMarker breaks the mold through its accessibility and ease of use. Crucially, SpanMarker works out of the box with many common encoders such as bert-base-cased and roberta-large, and automatically works with datasets using the IOB, IOB2, BIOES, BILOU or no label annotation scheme.

Installation

You may install the span_marker Python module via pip like so:

pip install span_marker

Quick Start

Please have a look at our Getting Started jupyter notebook for details on how SpanMarker is commonly used. That notebook explains the following snippet in more detail.

from datasets import load_dataset
from span_marker import SpanMarkerModel, Trainer
from transformers import TrainingArguments

dataset = load_dataset("DFKI-SLT/few-nerd", "supervised")
labels = dataset["train"].features["ner_tags"].feature.names

model_name = "bert-base-cased"
model = SpanMarkerModel.from_pretrained(model_name, labels=labels)

args = TrainingArguments(
    output_dir="my_span_marker_model",
    learning_rate=5e-5,
    gradient_accumulation_steps=2,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=1,
    save_strategy="steps",
    eval_steps=200,
    logging_steps=50,
    bf16=True,
    warmup_ratio=0.1,
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=dataset["train"].select(range(8000)),
    eval_dataset=dataset["validation"].select(range(2000)),
)

trainer.train()
trainer.save_model("my_span_marker_model/checkpoint-final")

metrics = trainer.evaluate()
print(metrics)

For this work is based on PL-Marker, you may expect similar results to its Papers with Code Leaderboard. Tests, documentation and further information on expected performance will come soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

span_marker-0.1.0.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

span_marker-0.1.0-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file span_marker-0.1.0.tar.gz.

File metadata

  • Download URL: span_marker-0.1.0.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for span_marker-0.1.0.tar.gz
Algorithm Hash digest
SHA256 597af36d040ec73e0bd5f3f650b9dc972e68dfbd7ff650f84ded12c71c5002de
MD5 fa2ed0e6a27e3afa9ec7c80cf985e9ff
BLAKE2b-256 efd3ac8a405c46d1f7937e15072320e194b5f3797653323aaa635e6a37d0ba82

See more details on using hashes here.

File details

Details for the file span_marker-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: span_marker-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for span_marker-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 17f533762e2eeb981dc929e2d5e878229f1f4cd6d3ac950e0573c2464032b88d
MD5 8ec171a7ce17f5dc8be43569ef988c54
BLAKE2b-256 91a481312648da34f968894b6e26eb4d6eb2e7b037420bdf464196d00f3152b2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page