Skip to main content

Natural language structuring library

Project description

NLStruct

Natural language struturing library. Currently, it implements only a NER model, but other algorithms will follow.

Features

  • processes large documents seamlessly: it automatically handles tokenization and sentence splitting.
  • do not train twice: an automatic caching mechanism detects when an experiment has already been run
  • stop & resume with checkpoints
  • easy import and export of data
  • handles nested or overlapping entities
  • pretty logging with rich_logger
  • heavily customizable, without config files (see train_ner.py)
  • built on top of transformers and pytorch_lightning

How to train a NER model

from nlstruct.recipes import train_ner

model = train_ner(
    dataset={
        "train": "path to your train brat/standoff data",
        "val": 0.05,  # or path to your validation data
        # "test": # and optional path to your test data
    },
    finetune_bert=False,
    seed=42,
    bert_name="camembert/camembert-base",
    fasttext_file="",
    gpus=0,
    xp_name="my-xp",
)
model.save_pretrained("ner.pt")

How to use it

from nlstruct import load_pretrained
from nlstruct.datasets import load_from_brat, export_to_brat

ner = load_pretrained("ner.pt")
export_to_brat(ner.predict(load_from_brat("path/to/brat/test")), filename_prefix="path/to/exported_brat")

Status

This project is still under development and subject to changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlstruct-0.0.3.tar.gz (71.5 kB view details)

Uploaded Source

Built Distributions

nlstruct-0.0.3-py3-none-any.whl (81.0 kB view details)

Uploaded Python 3

nlstruct-0.0.3-1-py3-none-any.whl (81.0 kB view details)

Uploaded Python 3

File details

Details for the file nlstruct-0.0.3.tar.gz.

File metadata

  • Download URL: nlstruct-0.0.3.tar.gz
  • Upload date:
  • Size: 71.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.4

File hashes

Hashes for nlstruct-0.0.3.tar.gz
Algorithm Hash digest
SHA256 1c8d474dcad8f1ec4287d475db643e79ff5948bec4f9c542bca1b3b7536752d3
MD5 51a9d4cb1093739883afccaea188f255
BLAKE2b-256 a64539e17a6821f434ce2f28d460707d495a6be215013fef8cfcaf77111db585

See more details on using hashes here.

File details

Details for the file nlstruct-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: nlstruct-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 81.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.4

File hashes

Hashes for nlstruct-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2c968c5877eff49621cb7d047f64001d46a1e3a655fd8494b77c6f46b689c241
MD5 b8aa1be1e19f8eb3145a1acdcd113cbd
BLAKE2b-256 488ec7310c159d88d39d25669604a4a025fe91c0b0021f08c54baf8b346c2c73

See more details on using hashes here.

File details

Details for the file nlstruct-0.0.3-1-py3-none-any.whl.

File metadata

  • Download URL: nlstruct-0.0.3-1-py3-none-any.whl
  • Upload date:
  • Size: 81.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.4

File hashes

Hashes for nlstruct-0.0.3-1-py3-none-any.whl
Algorithm Hash digest
SHA256 847f48f38bb318409d6be529cf544b1d8a1a275d9b842800b70628e20c643ff8
MD5 36522f8efb088cf297a519a87eb4a803
BLAKE2b-256 c90fd0466d5a3d66b44b03638557921d274bd3ec9f9d37c316899fb67887ce80

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page