Skip to main content

a high-level library for named entity recognition in python

Project description

A High-level Library for Named Entity Recognition in Python.

PyPI PyPI - Python Version CI https://coveralls.io/repos/github/flxst/nerblackbox/badge.svg?branch=master https://img.shields.io/badge/code%20style-black-000000.svg PyPI - License

Resources

Installation

pip install nerblackbox

About

https://raw.githubusercontent.com/flxst/nerblackbox/master/docs/docs/images/nerblackbox_sources.png

Take a dataset from one of many available sources. Then train, evaluate and apply a language model in a few simple steps.

1. Data

  • Choose a dataset from HuggingFace (HF), the Local Filesystem (LF), an Annotation Tool (AT) server, or a Built-in (BI) dataset

dataset = Dataset("conll2003",  source="HF")  # HuggingFace
dataset = Dataset("my_dataset", source="LF")  # Local Filesystem
dataset = Dataset("swe_nerc",   source="BI")  # Built-in
  • Set up the dataset

dataset.set_up()

2. Training

  • Define the training by choosing a pretrained model and a dataset

training = Training("my_training", model="bert-base-cased", dataset="conll2003")
  • Run the training and get the performance of the fine-tuned model

training.run()
training.get_result(metric="f1", level="entity", phase="test")
# 0.9045

3. Evaluation

  • Load the model

model = Model.from_training("my_training")
  • Evaluate the model

results = model.evaluate_on_dataset("ehealth_kd", phase="test")
results["micro"]["entity"]["f1"]
# 0.9045

4. Inference

  • Load the model

model = Model.from_training("my_training")
  • Let the model predict

model.predict("The United Nations has never recognised Jakarta's move.")
# [[
#  {'char_start': '4', 'char_end': '18', 'token': 'United Nations', 'tag': 'ORG'},
#  {'char_start': '40', 'char_end': '47', 'token': 'Jakarta', 'tag': 'LOC'}
# ]]

There is much more to it than that! See the documentation to get started.

Features

Data

  • Integration of Datasets from Multiple Sources (HuggingFace, Annotation Tools, ..)

  • Support for Multiple Dataset Types (Standard, Pretokenized)

  • Support for Multiple Annotation Schemes (IO, BIO, BILOU)

  • Text Encoding

Training

  • Adaptive Fine-tuning

  • Hyperparameter Search

  • Multiple Runs with Different Random Seeds

  • Detailed Analysis of Training Results

Evaluation

  • Evaluation of Any Model on Any Dataset

Inference

  • Versatile Model Inference (Entity/Word Level, Probabilities, ..)

Other

  • Full Compatibility with HuggingFace

  • GPU Support

  • Language Agnosticism

See the documentation for details.

Citation

@misc{nerblackbox,
  author = {Stollenwerk, Felix},
  title  = {nerblackbox: a high-level library for named entity recognition in python},
  year   = {2021},
  url    = {https://github.com/flxst/nerblackbox},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nerblackbox-1.0.0.tar.gz (127.0 kB view details)

Uploaded Source

Built Distribution

nerblackbox-1.0.0-py3-none-any.whl (175.2 kB view details)

Uploaded Python 3

File details

Details for the file nerblackbox-1.0.0.tar.gz.

File metadata

  • Download URL: nerblackbox-1.0.0.tar.gz
  • Upload date:
  • Size: 127.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for nerblackbox-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f978f5a6fadb1a832b6ebab75ba640fe18f1445a5543cf67bbcd6c551df04cc4
MD5 28edbd0e5d6eb80e9555c819275e9090
BLAKE2b-256 8e7db5d10381102a98b2c75488afe9015fe469e755be8f091e8abae963d50f0e

See more details on using hashes here.

File details

Details for the file nerblackbox-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: nerblackbox-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 175.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for nerblackbox-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 64eea60cc76f614fe1e8ca808d7f77445c1934a79d102ab0c028775f2861ceae
MD5 7552781ae2cd7bcc846ee9fb42f7592b
BLAKE2b-256 46887eb532ef4657a7d4e601fe9290b2ed8a441602b596fca9b3c49a2f6d1ba2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page