Skip to main content

A multi-lingual approach to AllenNLP CoReference Resolution, along with a wrapper for spaCy.

Project description

crosslingual-coreference

Coreference is amazing but the data required for training a model is very scarce. In our case, the available training for non-English languages also data proved to be poorly annotated. Crosslingual Coreference therefore uses the assumption a trained model with English data and cross-lingual embeddings should work for languages with similar sentence structure.

Install

pip install crosslingual-coreference

Quickstart

from crosslingual_coreference import Predictor

text = "Do not forget about Momofuku Ando! He created instant noodles in Osaka. At that location, Nissin was founded. Many students survived by eating these noodles, but they don't even know him."

predictor = Predictor(language="en_core_web_sm", device=-1, model="info_xlm")

print(predictor.predict(text)["resolved_text"])
# Output
# 
# Do not forget about Momofuku Ando! 
# Momofuku Ando created instant noodles in Osaka. 
# At Osaka, Nissin was founded. 
# Many students survived by eating instant noodles, 
# but Many students don't even know Momofuku Ando.

Use spaCy pipeline

import crosslingual_coreference
import spacy

text = "Do not forget about Momofuku Ando! He created instant noodles in Osaka. At that location, Nissin was founded. Many students survived by eating these noodles, but they don't even know him."

nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('xx_coref')

doc = nlp(text)
print(doc._.coref_clusters)
# Output
# 
# [[[4, 5], [7, 7], [27, 27], [36, 36]], 
# [[12, 12], [15, 16]], 
# [[9, 10], [27, 28]], 
# [[22, 23], [31, 31]]]
print(doc._.resolved_text)
# Output
# 
# Do not forget about Momofuku Ando! 
# Momofuku Ando created instant noodles in Osaka. 
# At Osaka, Nissin was founded. 
# Many students survived by eating instant noodles, 
# but Many students don't even know Momofuku Ando.

Available models

As of now, there are two models available "info_xlm", "xlm_roberta", which scored 77 and 74 on OntoNotes Release 5.0 English data, respectively.

More Examples

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crosslingual-coreference-0.1.0.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

crosslingual_coreference-0.1.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file crosslingual-coreference-0.1.0.tar.gz.

File metadata

  • Download URL: crosslingual-coreference-0.1.0.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.8.2 Windows/10

File hashes

Hashes for crosslingual-coreference-0.1.0.tar.gz
Algorithm Hash digest
SHA256 314ff0a86f47c39b38978a370b99d707251696dbc4d627fa5c9073167b5f6d97
MD5 a51e5d61dce6bb1cc03a17b79ceb59b8
BLAKE2b-256 27e51e6f6703528af9d7c45d11ccb676c3e7fe604591a0a813ed751a2ce7647e

See more details on using hashes here.

File details

Details for the file crosslingual_coreference-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for crosslingual_coreference-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 48dbba2e45124e56ab37a1031e4acf609a1e5fd625959f2a62e7c6df40851b7c
MD5 ca318c12e2fa896e3178eb1797d5f880
BLAKE2b-256 fa66323dda10783fe8f361dffa8481adf5d413db9b4984fed893b70f3519a788

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page