Annotator combining different NLP pipelines
Project description
Automated annotation of natural languages using selected toolchains
This project just had its first version release and is still under development.
Description
The nlpannotator
package serves as modular toolchain to combine different natural language processing (nlp) tools to annotate texts (sentencizing, tokenization, part-of-speech (POS) and lemma).
Options
All input options are provided in an input dictionary. Two pre-set toolchains can be used: fast
using spaCy for all annotations; accurate
using SoMaJo for sentencizing and tokenization, and stanza for POS and lemma; and manual
where any combination of spaCy, stanza, SoMaJo, Flair, Treetagger can be used, given the tool supports the selected annotation and language.
Installation
Install the project and its dependencies from PyPi:
pip install nlpannotator
The language models need to be installed separately. You can make use of the convenience script here which installs all language models for all languages that have been implemented for spaCy and stanza.
Usage
Take a look at the DemoNotebook or run it on Binder.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nlpannotator-1.0.1.tar.gz
.
File metadata
- Download URL: nlpannotator-1.0.1.tar.gz
- Upload date:
- Size: 22.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f2ef4c01558e125a8c251eb60b75308cfe4575d0c7ba8b8255d39de37764f046 |
|
MD5 | b8164533b11cf7c6a7c87d2e102d66a4 |
|
BLAKE2b-256 | 23eb9f6b28a1267c863f3e7defa74d3814d687869234fb015862188f484cb96c |
File details
Details for the file nlpannotator-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: nlpannotator-1.0.1-py3-none-any.whl
- Upload date:
- Size: 26.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8fdb308180953ce764749537942763d312f1b9383bd32dd1751df363f563f4dd |
|
MD5 | fdc745496575155efa52a6e1627bee4c |
|
BLAKE2b-256 | be77876be1e4ecf3363c216503b97b8c7b6c575b6069db37487f938e94fde37c |