Natural language structuring library
Project description
NLStruct
Natural language struturing library. Currently, it implements only a NER model, but other algorithms will follow.
Features
- processes large documents seamlessly: it automatically handles tokenization and sentence splitting.
- do not train twice: an automatic caching mechanism detects when an experiment has already been run
- stop & resume with checkpoints
- easy import and export of data
- handles nested or overlapping entities
- pretty logging with rich_logger
- heavily customizable, without config files (see train_ner.py)
- built on top of transformers and pytorch_lightning
How to train a NER model
from nlstruct.recipes import train_ner
model = train_ner(
dataset={
"train": "path to your train brat/standoff data",
"val": 0.05, # or path to your validation data
# "test": # and optional path to your test data
},
finetune_bert=False,
seed=42,
bert_name="camembert/camembert-base",
fasttext_file="",
gpus=0,
xp_name="my-xp",
)
model.save_pretrained("ner.pt")
How to use it
from nlstruct import load_pretrained
from nlstruct.datasets import load_from_brat, export_to_brat
ner = load_pretrained("ner.pt")
export_to_brat(ner.predict(load_from_brat("path/to/brat/test")), filename_prefix="path/to/exported_brat")
Status
This project is still under development and subject to changes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlstruct-0.0.3.tar.gz
(71.5 kB
view hashes)
Built Distributions
nlstruct-0.0.3-py3-none-any.whl
(81.0 kB
view hashes)
nlstruct-0.0.3-1-py3-none-any.whl
(81.0 kB
view hashes)
Close
Hashes for nlstruct-0.0.3-1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 847f48f38bb318409d6be529cf544b1d8a1a275d9b842800b70628e20c643ff8 |
|
MD5 | 36522f8efb088cf297a519a87eb4a803 |
|
BLAKE2b-256 | c90fd0466d5a3d66b44b03638557921d274bd3ec9f9d37c316899fb67887ce80 |