Skip to main content

Few-Shot Named Entity Recognition using Span Markers

Project description

SpanMarker for Named Entity Recognition

SpanMarker is a framework for training powerful Named Entity Recognition models using familiar encoders such as BERT, RoBERTa and DeBERTa. Tightly implemented on top of the 🤗 Transformers library, SpanMarker can take advantage of its valuable functionality.

Based on the PL-Marker paper, SpanMarker breaks the mold through its accessibility and ease of use. Crucially, SpanMarker works out of the box with many common encoders such as bert-base-cased and roberta-large, and automatically works with datasets using the IOB, IOB2, BIOES, BILOU or no label annotation scheme.

Documentation

Feel free to have a look at the documentation.

Installation

You may install the span_marker Python module via pip like so:

pip install span_marker

Quick Start

Please have a look at our Getting Started notebook for details on how SpanMarker is commonly used. It explains the following snippet in more detail.

from datasets import load_dataset
from span_marker import SpanMarkerModel, Trainer
from transformers import TrainingArguments

dataset = load_dataset("DFKI-SLT/few-nerd", "supervised")
labels = dataset["train"].features["ner_tags"].feature.names

model_name = "bert-base-cased"
model = SpanMarkerModel.from_pretrained(model_name, labels=labels)

args = TrainingArguments(
    output_dir="my_span_marker_model",
    learning_rate=5e-5,
    gradient_accumulation_steps=2,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=1,
    save_strategy="steps",
    eval_steps=200,
    logging_steps=50,
    fp16=True,
    warmup_ratio=0.1,
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=dataset["train"].select(range(8000)),
    eval_dataset=dataset["validation"].select(range(2000)),
)

trainer.train()
trainer.save_model("my_span_marker_model/checkpoint-final")

metrics = trainer.evaluate()
print(metrics)

Because this work is based on PL-Marker, you may expect similar results to its Papers with Code Leaderboard results. Tests, documentation and further information on expected performance will come soon.

Pretrained Models

Changelog

See CHANGELOG.md for news on all SpanMarker versions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

span_marker-0.2.2.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

span_marker-0.2.2-py3-none-any.whl (24.6 kB view details)

Uploaded Python 3

File details

Details for the file span_marker-0.2.2.tar.gz.

File metadata

  • Download URL: span_marker-0.2.2.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for span_marker-0.2.2.tar.gz
Algorithm Hash digest
SHA256 9879e95d36c6ee3480c9900db895f2f4d955bf5ddb3bf3eb4c915c7f1c0da9e6
MD5 95e1867129431f1531a00dee8844e27a
BLAKE2b-256 6bac66a5c011e8f736b10dbc510c264f4274c336426b7e01625a2b32632410c5

See more details on using hashes here.

File details

Details for the file span_marker-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: span_marker-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 24.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for span_marker-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6a731563dbca862548aaf7e57eac2f0ec9d83d6eaae65abcc115b3715e4f2999
MD5 3d7145a8f7fff28a80d955dcd854f9c5
BLAKE2b-256 ebdd9e95b5ffe3da7f2cbc8e939eb84ac11e85827b561d97c64a0891182145d0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page