Acromine based Disambiguation of Entities From Text

These details have not been verified by PyPI

Project links

Project description

Adeft

Adeft (Acromine based Disambiguation of Entities From Text context) is a utility for building models to disambiguate acronyms and other abbreviations of biological terms in the scientific literature. It makes use of an implementation of the Acromine algorithm developed by the NaCTeM at the University of Manchester to identify possible longform expansions for shortforms in a text corpus. It allows users to build disambiguation models to disambiguate shortforms based on their text context. A growing number of pretrained disambiguation models are publicly available to download through adeft.

Citation

If you use Adeft in your research, please cite the paper in the Journal of Open Source Software:

Steppi A, Gyori BM, Bachman JA (2020). Adeft: Acromine-based Disambiguation of Entities from Text with applications to the biomedical literature. Journal of Open Source Software, 5(45), 1708, https://doi.org/10.21105/joss.01708

Installation

Adeft works with Python versions 3.5 and above. It is available on PyPi and can be installed with the command

$ pip install adeft

Adeft's pretrained machine learning models can then be downloaded with the command

$ python -m adeft.download

If you choose to install by cloning this repository

$ git clone https://github.com/indralab/adeft.git

You should also run

$ python setup.py build_ext --inplace

at the top level of your local repository in order to build the extension module for alignment based longform detection and scoring.

Using Adeft

A dictionary of available models can be imported with from adeft import available_models

The dictionary maps shortforms to model names. It's possible for multiple equivalent shortforms to map to the same model.

Here's an example of running a disambiguator for ER on a list of texts

from adeft.disambiguate import load_disambiguator

er_dd = load_disambiguator('ER')

    ...

er_dd.disambiguate(texts)

Users may also build and train their own disambiguators. See the documention for more info.

Documentation

Documentation is available at https://adeft.readthedocs.io

Jupyter notebooks illustrating Adeft workflows are available under notebooks:

Testing

Adeft uses nosetests for unit testing, and is integrated with the Travis continuous integration environment. To run tests locally, make sure to install the test-specific requirements listed in setup.py as

pip install adeft[test]

and download all pre-trained models as shown above. Then run nosetests in the top-level adeft folder.

Funding

Development of this software was supported by the Defense Advanced Research Projects Agency under award W911NF018-1-0124 and the National Cancer Institute under award U54-CA225088.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.12.3

May 10, 2024

0.12.2

May 10, 2024

0.12.1

May 10, 2024

0.12.0

May 10, 2024

0.11.2

Nov 5, 2022

0.11.1

May 21, 2022

0.11.0

May 7, 2021

0.10.0

Dec 3, 2020

0.9.0

Nov 19, 2020

0.8.0

Nov 16, 2020

0.7.0

Sep 10, 2020

This version

0.6.0

Jan 30, 2020

0.5.5

Jan 15, 2020

0.5.4

Jan 15, 2020

0.5.3

Jan 14, 2020

0.5.1

Nov 8, 2019

0.5.0

Nov 8, 2019

0.4.0

Sep 25, 2019

0.3.0

Jun 25, 2019

0.2.1

May 29, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adeft-0.6.0.tar.gz (123.3 kB view details)

Uploaded Jan 30, 2020 Source

File details

Details for the file adeft-0.6.0.tar.gz.

File metadata

Download URL: adeft-0.6.0.tar.gz
Upload date: Jan 30, 2020
Size: 123.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.5

File hashes

Hashes for adeft-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`57448badbabc4ee360faadf5bca60e610c094418310938113643106e344fc74c`
MD5	`3bdced847c4a31e4387ea6288d194fdd`
BLAKE2b-256	`66ad4866f5ae2a5a54de6267f9cbc3d6e6ced0dabb6dc8353e07b3d45c8c81d4`