A small Python library for the NLP Interchange Format (NIF)
Project description
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. It offers a standard representation of annotated texts for tasks such as Named Entity Recognition or Entity Linking. It is used by GERBIL to run reproducible evaluations of annotators.
This Python library can be used to serialize and deserialized annotated corpora in NIF.
Documentation
Supported NIF versions
NIF 2.1, serialized in any of the formats supported by rdflib
Overview
This library is revolves around three core classes: * a NIFContext is a document (a string); * a NIFPhrase is the annotation of a snippet of text (usually a phrase) in a document; * a NIFCollection is a set of documents, which constitutes a collection. In NIF, each of these objects is identified by a URI, and their attributes and relations are encoded by RDF triples between these URIs. This library abstracts away the encoding by letting you manipulate collections, contexts and phrases as plain Python objects.
Quickstart
Import and create a collection
from pynif import NIFCollection
collection = NIFCollection(uri="http://freme-project.eu")
Create a context
context = collection.add_context(
uri="http://freme-project.eu/doc32",
mention="Diego Maradona is from Argentina.")
Create entries for the entities
context.add_phrase(
beginIndex=0,
endIndex=14,
taClassRef=['http://dbpedia.org/ontology/SportsManager', 'http://dbpedia.org/ontology/Person', 'http://nerd.eurecom.fr/ontology#Person'],
score=0.9869992701528016,
annotator='http://freme-project.eu/tools/freme-ner',
taIdentRef='http://dbpedia.org/resource/Diego_Maradona',
taMsClassRef='http://dbpedia.org/ontology/SoccerManager')
context.add_phrase(
beginIndex=23,
endIndex=32,
taClassRef=['http://dbpedia.org/ontology/PopulatedPlace', 'http://nerd.eurecom.fr/ontology#Location',
'http://dbpedia.org/ontology/Place'],
score=0.9804963628413852,
annotator='http://freme-project.eu/tools/freme-ner',
taMsClassRef='http://dbpedia.org/resource/Argentina')
Finally, get the output with the format that you need
generated_nif = collection.dumps(format='turtle')
print(generated_nif)
You will obtain the NIF representation as a string:
<http://freme-project.eu> a nif:ContextCollection ;
nif:hasContext <http://freme-project.eu/doc32> ;
ns1:conformsTo <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/2.1> .
<http://freme-project.eu/doc32> a nif:Context,
nif:OffsetBasedString ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "33"^^xsd:nonNegativeInteger ;
nif:isString "Diego Maradona is from Argentina." .
<http://freme-project.eu/doc32#offset_0_14> a nif:OffsetBasedString,
nif:Phrase ;
nif:anchorOf "Diego Maradona" ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "14"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://freme-project.eu/doc32> ;
nif:taMsClassRef <http://dbpedia.org/ontology/SoccerManager> ;
itsrdf:taAnnotatorsRef <http://freme-project.eu/tools/freme-ner> ;
itsrdf:taClassRef <http://dbpedia.org/ontology/Person>,
<http://dbpedia.org/ontology/SportsManager>,
<http://nerd.eurecom.fr/ontology#Person> ;
itsrdf:taConfidence 9.869993e-01 ;
itsrdf:taIdentRef <http://dbpedia.org/resource/Diego_Maradona> .
<http://freme-project.eu/doc32#offset_23_32> a nif:OffsetBasedString,
nif:Phrase ;
nif:anchorOf "Argentina" ;
nif:beginIndex "23"^^xsd:nonNegativeInteger ;
nif:endIndex "32"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://freme-project.eu/doc32> ;
nif:taMsClassRef <http://dbpedia.org/resource/Argentina> ;
itsrdf:taAnnotatorsRef <http://freme-project.eu/tools/freme-ner> ;
itsrdf:taClassRef <http://dbpedia.org/ontology/Place>,
<http://dbpedia.org/ontology/PopulatedPlace>,
<http://nerd.eurecom.fr/ontology#Location> ;
itsrdf:taConfidence 9.804964e-01 .
You can then parse it back:
parsed_collection = NIFCollection.loads(generated_nif, format='turtle')
for context in parsed_collection.contexts:
for phrase in context.phrases:
print(phrase)
Issues
If you have any problems with or questions about this library, please contact us through a GitHub issue.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pynif-0.1.4.tar.gz
.
File metadata
- Download URL: pynif-0.1.4.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | edb13c8b8aef01325750b917a49376d35a90e66becb49a84b3148ccea9c5546d |
|
MD5 | e2d745efe3e0298dccde273e26b0bac3 |
|
BLAKE2b-256 | 4cee0b58b1ef2defcf3202742e343f2dfd7f63cc2740b30fbcfb67d9dcb6f8bf |
File details
Details for the file pynif-0.1.4-py2.py3-none-any.whl
.
File metadata
- Download URL: pynif-0.1.4-py2.py3-none-any.whl
- Upload date:
- Size: 16.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d4948c8016f9d6b221a435618163c0f74d92e18048532a5882aeb3e6e457df8 |
|
MD5 | 8093e2c07e3e8cb918d6646594f7c6d6 |
|
BLAKE2b-256 | 23aa0a0d262b1829cf4f544a2e6e03d27f5ad9352150f209509d3a6b766475de |