Skip to main content

A multi-lingual approach to AllenNLP CoReference Resolution, along with a wrapper for spaCy.

Project description

Crosslingual Coreference

Coreference is amazing but the data required for training a model is very scarce. In our case, the available training for non-English languages also proved to be poorly annotated. Crosslingual Coreference, therefore, uses the assumption a trained model with English data and cross-lingual embeddings should work for languages with similar sentence structures.

Current Release Version pypi Version PyPi downloads

Install

pip install crosslingual-coreference

Quickstart

from crosslingual_coreference import Predictor

text = "Do not forget about Momofuku Ando! He created instant noodles in Osaka. At that location, Nissin was founded. Many students survived by eating these noodles, but they don't even know him."

predictor = Predictor(language="en_core_web_sm", device=-1, model_name="info_xlm")

print(predictor.predict(text)["resolved_text"])
# Output
# 
# Do not forget about Momofuku Ando! 
# Momofuku Ando created instant noodles in Osaka. 
# At Osaka, Nissin was founded. 
# Many students survived by eating instant noodles, 
# but Many students don't even know Momofuku Ando.

Use spaCy pipeline

import crosslingual_coreference
import spacy

text = "Do not forget about Momofuku Ando! He created instant noodles in Osaka. At that location, Nissin was founded. Many students survived by eating these noodles, but they don't even know him."

nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('xx_coref')

doc = nlp(text)
print(doc._.coref_clusters)
# Output
# 
# [[[4, 5], [7, 7], [27, 27], [36, 36]], 
# [[12, 12], [15, 16]], 
# [[9, 10], [27, 28]], 
# [[22, 23], [31, 31]]]
print(doc._.resolved_text)
# Output
# 
# Do not forget about Momofuku Ando! 
# Momofuku Ando created instant noodles in Osaka. 
# At Osaka, Nissin was founded. 
# Many students survived by eating instant noodles, 
# but Many students don't even know Momofuku Ando.

Available models

As of now, there are two models available "info_xlm", "xlm_roberta", which scored 77 and 74 on OntoNotes Release 5.0 English data, respectively.

More Examples

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crosslingual-coreference-0.1.5.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

crosslingual_coreference-0.1.5-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file crosslingual-coreference-0.1.5.tar.gz.

File metadata

  • Download URL: crosslingual-coreference-0.1.5.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.8.2 Windows/10

File hashes

Hashes for crosslingual-coreference-0.1.5.tar.gz
Algorithm Hash digest
SHA256 9bc8d24f99625b3e7fff0c472f7af6a4d8dc09b21da2756126f2f16f4e4cc34e
MD5 b19ef928a1fee5be1e9f80beab189b00
BLAKE2b-256 b64ba377cdccbaf777d7ed962807534c68ab7d30ce8966af0b06eaa4b1c0de01

See more details on using hashes here.

File details

Details for the file crosslingual_coreference-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for crosslingual_coreference-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 8d25d1a06763feea4dd33173bb646c36a7ed9af768f7152a591e83235f0ad0fa
MD5 7215707e2f8b65560d338e650eb5d0c8
BLAKE2b-256 5ee539260521134b48d4715a516d7b1bb8f3b2051a9ac03bb6efbdd77057c65d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page