Skip to main content

fine-tune transformer-based language models for named entity recognition

Project description

A python package to fine-tune transformer-based language models for named entity recognition (NER).

PyPI PyPI - Python Version CI https://coveralls.io/repos/github/flxst/nerblackbox/badge.svg?branch=master https://img.shields.io/badge/code%20style-black-000000.svg PyPI - License

Resources

Installation

pip install nerblackbox

About

https://raw.githubusercontent.com/flxst/nerblackbox/master/docs/docs/images/nerblackbox_sources.png

Take a dataset from one of many available sources. Then train, evaluate and apply a language model in a few simple steps.

1. Data

  • Choose a dataset from HuggingFace (HF), the Local Filesystem (LF), an Annotation Tool (AT) server, or a Built-in (BI) dataset

dataset = Dataset("conll2003",  source="HF")  # HuggingFace
dataset = Dataset("my_dataset", source="LF")  # Local Filesystem
dataset = Dataset("swe_nerc",   source="BI")  # Built-in
  • Set up the dataset

dataset.set_up()

2. Training

  • Define a fine-tuning experiment by choosing a pretrained model and a dataset

experiment = Experiment("my_experiment", model="bert-base-cased", dataset="conll2003")
  • Run the experiment and get the performance of the fine-tuned model

experiment.run()
experiment.get_result(metric="f1", level="entity", phase="test")
# 0.9045

3. Evaluation

  • Load the model

model = Model.from_experiment("my_experiment")
  • Evaluate the model

evaluation_dict = model.evaluate_on_dataset("ehealth_kd", "jsonl", phase="test")
evaluation_dict["micro"]["entity"]["f1"]
# 0.9045

4. Inference

  • Load the model

model = Model.from_experiment("my_experiment")
  • Let the model predict

model.predict("The United Nations has never recognised Jakarta's move.")
# [[
#  {'char_start': '4', 'char_end': '18', 'token': 'United Nations', 'tag': 'ORG'},
#  {'char_start': '40', 'char_end': '47', 'token': 'Jakarta', 'tag': 'LOC'}
# ]]

There is much more to it than that! See the documentation to get started.

Features

Data

  • Integration of Datasets from Multiple Sources (HuggingFace, Annotation Tools, ..)

  • Support for Multiple Dataset Types (Standard, Pretokenized)

  • Support for Multiple Annotation Schemes (IO, BIO, BILOU)

  • Text Encoding

Training

  • Adaptive Fine-tuning

  • Hyperparameter Search

  • Multiple Runs with Different Random Seeds

  • Detailed Analysis of Training Results

Evaluation

  • Evaluation of Any Model on Any Dataset

Inference

  • Versatile Model Inference (Entity/Word Level, Probabilities, ..)

Other

  • Full Compatibility with HuggingFace

  • GPU Support

  • Language Agnosticism

See the documentation for details.

Citation

@misc{nerblackbox,
  author = {Stollenwerk, Felix},
  title  = {nerblackbox: a python package to fine-tune transformer-based language models for named entity recognition},
  year   = {2021},
  url    = {https://github.com/flxst/nerblackbox},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nerblackbox-0.0.15.tar.gz (121.3 kB view details)

Uploaded Source

Built Distribution

nerblackbox-0.0.15-py3-none-any.whl (170.2 kB view details)

Uploaded Python 3

File details

Details for the file nerblackbox-0.0.15.tar.gz.

File metadata

  • Download URL: nerblackbox-0.0.15.tar.gz
  • Upload date:
  • Size: 121.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for nerblackbox-0.0.15.tar.gz
Algorithm Hash digest
SHA256 aeac2210e0cd06a7dd4874d82ef429cf58bde0ad532e3ae2cee0511015fb60b5
MD5 fcb3ce68c4612b819a034d87711fb106
BLAKE2b-256 87eb42028a8798b7dfd41c782acbc68867015e668fa217282159af0c3b728e5e

See more details on using hashes here.

File details

Details for the file nerblackbox-0.0.15-py3-none-any.whl.

File metadata

  • Download URL: nerblackbox-0.0.15-py3-none-any.whl
  • Upload date:
  • Size: 170.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for nerblackbox-0.0.15-py3-none-any.whl
Algorithm Hash digest
SHA256 48ddd0ea27a0060709f85293efbfa9565c890d70c6dfad22a2ba47222a3ac1d7
MD5 1e90b3c7744dc9240f84e2407f6dbae3
BLAKE2b-256 c23bbd384a63d139eea7417d9330ae06222f5f7e3e1796b230d46149d5f916f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page