Natural language structuring library
Project description
NLStruct
Natural language struturing library. Currently, it implements only a NER model, but other algorithms will follow.
Features
- processes large documents seamlessly: it automatically handles tokenization and sentence splitting.
- do not train twice: an automatic caching mechanism detects when an experiment has already been run
- stop & resume with checkpoints
- easy import and export of data
- handles nested or overlapping entities
- pretty logging with rich_logger
- heavily customizable, without config files (see train_ner.py)
- built on top of transformers and pytorch_lightning
How to train a NER model
from nlstruct.recipes import train_ner
model = train_ner(
dataset={
"train": "path to your train brat/standoff data",
"val": 0.05, # or path to your validation data
# "test": # and optional path to your test data
},
finetune_bert=False,
seed=42,
bert_name="camembert/camembert-base",
fasttext_file="",
gpus=0,
xp_name="my-xp",
)
model.save_pretrained("ner.pt")
How to use it
from nlstruct import load_pretrained
from nlstruct.datasets import load_from_brat, export_to_brat
ner = load_pretrained("ner.pt")
export_to_brat(ner.predict(load_from_brat("path/to/brat/test")), filename_prefix="path/to/exported_brat")
Install
This project is still under development and subject to changes.
pip install nlstruct
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlstruct-0.0.4.tar.gz
(71.6 kB
view details)
Built Distribution
nlstruct-0.0.4-py3-none-any.whl
(81.0 kB
view details)
File details
Details for the file nlstruct-0.0.4.tar.gz
.
File metadata
- Download URL: nlstruct-0.0.4.tar.gz
- Upload date:
- Size: 71.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2bf0e816c80188a38a08312a0b37673ac3158b5ec477ef427495675b4eab97b |
|
MD5 | 9fe20d67a456997c68e7dcbfe89f131a |
|
BLAKE2b-256 | 4e2e5e4ba3096f986f01ed782302e82ae3e71899632a01c7f819461d07f7d2de |
File details
Details for the file nlstruct-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: nlstruct-0.0.4-py3-none-any.whl
- Upload date:
- Size: 81.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d73fb159270d2798511320402dc0a8c2806e8cc1b4ab7956da27398367e6c7c1 |
|
MD5 | a04637c28cd75c9a5d9b0a59f9b95057 |
|
BLAKE2b-256 | 285af25eafb62bb8c51be0d8c66308dc194322b4ef8b23db212c13d7e4b180e5 |