Skip to main content

A small Python library for the NLP Interchange Format (NIF)

Project description

The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. It offers a standard representation of annotated texts for tasks such as Named Entity Recognition or Entity Linking. It is used by GERBIL to run reproducible evaluations of annotators.

This Python library can be used to serialize and deserialized annotated corpora in NIF.

Documentation

NIF Documentation

Supported NIF versions

NIF 2.1, serialized in any of the formats supported by rdflib

Overview

This library is revolves around three core classes: * a NIFContext is a document (a string); * a NIFPhrase is the annotation of a snippet of text (usually a phrase) in a document; * a NIFCollection is a set of documents, which constitutes a collection. In NIF, each of these objects is identified by a URI, and their attributes and relations are encoded by RDF triples between these URIs. This library abstracts away the encoding by letting you manipulate collections, contexts and phrases as plain Python objects.

Quickstart

  1. Import and create a collection

from pynif import NIFCollection

collection = NIFCollection(uri="http://freme-project.eu")
  1. Create a context

context = collection.add_context(
    uri="http://freme-project.eu/doc32",
    mention="Diego Maradona is from Argentina.")
  1. Create entries for the entities

context.add_phrase(
    beginIndex=0,
    endIndex=14,
    taClassRef=['http://dbpedia.org/ontology/SportsManager', 'http://dbpedia.org/ontology/Person', 'http://nerd.eurecom.fr/ontology#Person'],
    score=0.9869992701528016,
    annotator='http://freme-project.eu/tools/freme-ner',
    taIdentRef='http://dbpedia.org/resource/Diego_Maradona',
    taMsClassRef='http://dbpedia.org/ontology/SoccerManager')

context.add_phrase(
    beginIndex=23,
    endIndex=32,
    taClassRef=['http://dbpedia.org/ontology/PopulatedPlace', 'http://nerd.eurecom.fr/ontology#Location',
    'http://dbpedia.org/ontology/Place'],
    score=0.9804963628413852,
    annotator='http://freme-project.eu/tools/freme-ner',
    taMsClassRef='http://dbpedia.org/resource/Argentina')
  1. Finally, get the output with the format that you need

generated_nif = collection.dumps(format='turtle')
print(generated_nif)

You will obtain the NIF representation as a string:

<http://freme-project.eu> a nif:ContextCollection ;
    nif:hasContext <http://freme-project.eu/doc32> ;
    ns1:conformsTo <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/2.1> .

<http://freme-project.eu/doc32> a nif:Context,
        nif:OffsetBasedString ;
    nif:beginIndex "0"^^xsd:nonNegativeInteger ;
    nif:endIndex "33"^^xsd:nonNegativeInteger ;
    nif:isString "Diego Maradona is from Argentina." .

<http://freme-project.eu/doc32#offset_0_14> a nif:OffsetBasedString,
        nif:Phrase ;
    nif:anchorOf "Diego Maradona" ;
    nif:beginIndex "0"^^xsd:nonNegativeInteger ;
    nif:endIndex "14"^^xsd:nonNegativeInteger ;
    nif:referenceContext <http://freme-project.eu/doc32> ;
    nif:taMsClassRef <http://dbpedia.org/ontology/SoccerManager> ;
    itsrdf:taAnnotatorsRef <http://freme-project.eu/tools/freme-ner> ;
    itsrdf:taClassRef <http://dbpedia.org/ontology/Person>,
        <http://dbpedia.org/ontology/SportsManager>,
        <http://nerd.eurecom.fr/ontology#Person> ;
    itsrdf:taConfidence 9.869993e-01 ;
    itsrdf:taIdentRef <http://dbpedia.org/resource/Diego_Maradona> .

<http://freme-project.eu/doc32#offset_23_32> a nif:OffsetBasedString,
        nif:Phrase ;
    nif:anchorOf "Argentina" ;
    nif:beginIndex "23"^^xsd:nonNegativeInteger ;
    nif:endIndex "32"^^xsd:nonNegativeInteger ;
    nif:referenceContext <http://freme-project.eu/doc32> ;
    nif:taMsClassRef <http://dbpedia.org/resource/Argentina> ;
    itsrdf:taAnnotatorsRef <http://freme-project.eu/tools/freme-ner> ;
    itsrdf:taClassRef <http://dbpedia.org/ontology/Place>,
        <http://dbpedia.org/ontology/PopulatedPlace>,
        <http://nerd.eurecom.fr/ontology#Location> ;
    itsrdf:taConfidence 9.804964e-01 .
  1. You can then parse it back:

parsed_collection = NIFCollection.loads(generated_nif, format='turtle')

for context in parsed_collection.contexts:
   for phrase in context.phrases:
       print(phrase)

Issues

If you have any problems with or questions about this library, please contact us through a GitHub issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynif-0.3.0.linux-x86_64.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

pynif-0.3.0-py2.py3-none-any.whl (19.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pynif-0.3.0.linux-x86_64.tar.gz.

File metadata

  • Download URL: pynif-0.3.0.linux-x86_64.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.2

File hashes

Hashes for pynif-0.3.0.linux-x86_64.tar.gz
Algorithm Hash digest
SHA256 fa4c7753a03d529f54eb222029aae0819d5fd721c6053de860793afbab9aa139
MD5 b03da9ed37a87433002ca6b2c14f4830
BLAKE2b-256 2df8d65cca2c9e01153d9472ceafc63e74f5e40ee1f8426e14633e9a68049733

See more details on using hashes here.

File details

Details for the file pynif-0.3.0-py2.py3-none-any.whl.

File metadata

  • Download URL: pynif-0.3.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.2

File hashes

Hashes for pynif-0.3.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 bbc8b801d2ee3aa93a50740daea5dc8bc8f1735882f7e6cd23103cea5808c576
MD5 1e58c6ebda31b61cad2f36e4eb47d457
BLAKE2b-256 5e1704aeec2d58e400d5373ddf2c583954cc6e98af888ec605df18c7d1fefcea

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page