Skip to main content

Coreference resolution with e2e for Dutch

Project description

Python package Scrutinizer Code Quality codecov DOI

e2e-Dutch

Code for e2e coref model in Dutch. The code is based on the original e2e model for English, and modified to work for Dutch. If you make use of this code, please cite it and also cite the original e2e paper.

Installation

Requirements:

  • Python 3.6 or 3.7
  • pip

In this repository, run:

pip install -r requirements.txt
./scripts/setup_all.sh
pip install .

The setup_all script downloads the word vector files to the data directories. It also builds the application-specific tensorflow kernels.

Quick start

A pretrained model is available to download:

python -m e2edutch.download

This downloads the model files, the default location is the data directory inside the python package location. It can also be set manually by specifying the enviornment vairable E2E_HOME or through the config file (see below).

The pretrained model can be used to predict coreferences on a conll 2012 files, jsonlines files, NAF files or plain text files (in the latter case, the nltk package will be used for tokenization).

python -m e2edutch.predict [-h] [-o OUTPUT_FILE] [-f {conll,jsonlines,naf}]
                  [-c WORD_COL] [--cfg_file CFG_FILE] [-v]
                  config input_filename

positional arguments:
  config: name of the model to use for prediction ('final' for the pretrained)
  input_filename

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
  -f {conll,jsonlines,naf}, --format_out {conll,jsonlines,naf}
  -c WORD_COL, --word_col WORD_COL
  --cfg_file CFG_FILE   config file
  -v, --verbose


The user-specific configurations (such as data directory, data files, etc) can be provided in a separate config file, the defaults are specified in cfg/defaults.conf.

Train your own model

To train a new model:

  • Make sure the model config file (default: e2edutch/cfg/models.conf) describes the model you wish to train
  • Make sure your config file (default: e2edutch/cfg/defaults.conf) includes the data files you want to use for training
  • Run scripts/setup_train.sh e2edutch/cfg/defaults.conf. This script converts the conll2012 data to jsonlines files, and caches the word and contextualized embeddings.
  • If you want to enable the use of a GPU, set the environment variable:
export GPU=0
  • Run the training script:
python -m e2edutch.train <model-name>

Citing this code

If you use this code in your research, please cite it as follows:

@misc{YourReferenceHere,
author = {
            Dafne van Kuppevelt and
            Jisk Attema
         },
title  = {e2e-Dutch},
doi    = {10.5281/zenodo.4146960},
url    = {https://github.com/Filter-Bubble/e2e-Dutch}
}

As the code is largely based on original e2e model for English, please make sure to also cite the original e2e paper.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

e2e-Dutch-0.3.1.tar.gz (28.0 kB view details)

Uploaded Source

Built Distributions

e2e_Dutch-0.3.1-py3.8.egg (116.6 kB view details)

Uploaded Source

e2e_Dutch-0.3.1-py3-none-any.whl (72.2 kB view details)

Uploaded Python 3

File details

Details for the file e2e-Dutch-0.3.1.tar.gz.

File metadata

  • Download URL: e2e-Dutch-0.3.1.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.5

File hashes

Hashes for e2e-Dutch-0.3.1.tar.gz
Algorithm Hash digest
SHA256 2a28edc8c2a3488f2eeba610925c9a6369eb57d58899856094b0d87079fbbabb
MD5 e9a8a6e46a3c67ffc3d2dc84d438eefb
BLAKE2b-256 19a31601861b57b4b63ba249842544aab523edbdd393daf5af281523d6ec1914

See more details on using hashes here.

File details

Details for the file e2e_Dutch-0.3.1-py3.8.egg.

File metadata

  • Download URL: e2e_Dutch-0.3.1-py3.8.egg
  • Upload date:
  • Size: 116.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.0.0.post20201207 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.5

File hashes

Hashes for e2e_Dutch-0.3.1-py3.8.egg
Algorithm Hash digest
SHA256 66e4128c1761799b6d54273996374bd9cde73282c1210914efeac7efb3826a8d
MD5 94d6f899c64381b8f294a3318bce7187
BLAKE2b-256 c96e8e8c2a0a12d6fd64ca4b916d3346626f3ca33b3eb5ac334fe741925cda35

See more details on using hashes here.

File details

Details for the file e2e_Dutch-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: e2e_Dutch-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 72.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.5

File hashes

Hashes for e2e_Dutch-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 281ab7a48fe9e46833d9662b18435e152a10f2b518dfed69369e3708ad697e71
MD5 cd31b53fdc182a5dcbf16cd4cd06fb7f
BLAKE2b-256 be7a60dc5589cf0582b3c60f153211737dcfc183c95c378ac7f4fd25045f064a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page