Skip to main content

NLP pipeline software using common workflow language

Project description

Codacy Badge Build Status Documentation Status PyPI version PyPI DOI

nlppln is a python package for creating NLP pipelines using Common Workflow Language (CWL). It provides steps for (generic) NLP functionality, such as tokenization, lemmatization, and part of speech tagging, and helps users to construct workflows from these steps.

A text processing step consist of a (Python) command line tool and a CWL specification to use this tool. Most tools provided by nppln wrap existing NLP functionality. The command line tools are made with Click, a Python package for creating command line interfaces.

To create a workflow, you have to write a Python script:

from nlppln import WorkflowGenerator

with WorkflowGenerator() as wf:
  txt_dir = wf.add_input(txt_dir='Directory')

  frogout = wf.frog_dir(in_dir=txt_dir)
  saf = wf.frog_to_saf(in_files=frogout)
  ner_stats = wf.save_ner_data(in_files=saf)
  new_saf = wf.replace_ner(metadata=ner_stats, in_files=saf)
  txt = wf.saf_to_txt(in_files=new_saf)

  wf.add_outputs(ner_stats=ner_stats, txt=txt)

  wf.save('anonymize.cwl')

The resulting workflow can be run using a CWL runner, such as cwltool:

cwltool anonymize.cwl --txt_dir /path/to/directory/with/txt/files/

For creating new (e.g., project specific) NLP functionality, you can use nlppln-gen to generate boilerplate (i.e., empty) command line tools and CWL specifications.

The full documentation can be found on Read the Docs.

Installation

Install nlppln using pip:

pip install nlppln

Please check the installation guidelines for additional required software.

License

Copyright (c) 2016-2018, Netherlands eScience Center, University of Twente

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlppln-0.3.3.tar.gz (24.0 kB view details)

Uploaded Source

File details

Details for the file nlppln-0.3.3.tar.gz.

File metadata

  • Download URL: nlppln-0.3.3.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0 requests/2.19.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.0

File hashes

Hashes for nlppln-0.3.3.tar.gz
Algorithm Hash digest
SHA256 3b86fd016d17ce356af213f318f34becb543e6fc601fadf124f2448afb04b3af
MD5 fd4f55fc85ce0b4898f16b7ab6068dbf
BLAKE2b-256 c1b9f188385aee2cacfb7f5e6010fee01465b784b9f391a4ea15c9a0bb4e47e0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page