Natural language structuring library
Project description
NLStruct
Natural language struturing library. Currently, it implements only a NER model, but other algorithms will follow.
Features
- processes large documents seamlessly: it automatically handles tokenization and sentence splitting.
- do not train twice: an automatic caching mechanism detects when an experiment has already been run
- stop & resume with checkpoints
- easy import and export of data
- handles nested or overlapping entities
- pretty logging with rich_logger
- heavily customizable, without config files (see train_ner.py)
- built on top of transformers and pytorch_lightning
How to train a NER model
from nlstruct.recipes import train_ner
model = train_ner(
dataset={
"train": "path to your train brat/standoff data",
"val": 0.05, # or path to your validation data
# "test": # and optional path to your test data
},
finetune_bert=False,
seed=42,
bert_name="camembert/camembert-base",
fasttext_file="",
gpus=0,
xp_name="my-xp",
)
model.save_pretrained("ner.pt")
How to use it
from nlstruct import load_pretrained
from nlstruct.datasets import load_from_brat, export_to_brat
ner = load_pretrained("ner.pt")
export_to_brat(ner.predict(load_from_brat("path/to/brat/test")), filename_prefix="path/to/exported_brat")
Status
This project is still under development and subject to changes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlstruct-0.0.3.tar.gz
(71.5 kB
view details)
Built Distributions
nlstruct-0.0.3-py3-none-any.whl
(81.0 kB
view details)
File details
Details for the file nlstruct-0.0.3.tar.gz
.
File metadata
- Download URL: nlstruct-0.0.3.tar.gz
- Upload date:
- Size: 71.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c8d474dcad8f1ec4287d475db643e79ff5948bec4f9c542bca1b3b7536752d3 |
|
MD5 | 51a9d4cb1093739883afccaea188f255 |
|
BLAKE2b-256 | a64539e17a6821f434ce2f28d460707d495a6be215013fef8cfcaf77111db585 |
File details
Details for the file nlstruct-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: nlstruct-0.0.3-py3-none-any.whl
- Upload date:
- Size: 81.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c968c5877eff49621cb7d047f64001d46a1e3a655fd8494b77c6f46b689c241 |
|
MD5 | b8aa1be1e19f8eb3145a1acdcd113cbd |
|
BLAKE2b-256 | 488ec7310c159d88d39d25669604a4a025fe91c0b0021f08c54baf8b346c2c73 |
File details
Details for the file nlstruct-0.0.3-1-py3-none-any.whl
.
File metadata
- Download URL: nlstruct-0.0.3-1-py3-none-any.whl
- Upload date:
- Size: 81.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 847f48f38bb318409d6be529cf544b1d8a1a275d9b842800b70628e20c643ff8 |
|
MD5 | 36522f8efb088cf297a519a87eb4a803 |
|
BLAKE2b-256 | c90fd0466d5a3d66b44b03638557921d274bd3ec9f9d37c316899fb67887ce80 |