Skip to main content

Generate all possible forms of an English word.

Project description

word forms logo

Accurately generate all possible forms of an English word

Word forms can accurately generate all possible forms of an English word. It can conjugate verbs. It can connect different parts of speeches e.g noun to adjective, adjective to adverb, noun to verb etc. It can pluralize singular nouns. It does this all in one function. Enjoy!

Examples

Some very timely examples :-P

>>> from word_forms.word_forms import get_word_forms
>>> get_word_forms("president")
>>> {'n': {'presidents', 'presidentships', 'presidencies', 'presidentship', 'president', 'presidency'},
     'a': {'presidential'},
     'v': {'preside', 'presided', 'presiding', 'presides'},
     'r': {'presidentially'}}
>>> get_word_forms("elect")
>>> {'n': {'elects', 'electives', 'electors', 'elect', 'eligibilities', 'electorates', 'eligibility', 'elector', 'election', 'elections', 'electorate', 'elective'},
     'a': {'eligible', 'electoral', 'elective', 'elect'},
     'v': {'electing', 'elects', 'elected', 'elect'},
     'r': set()}
>>> get_word_forms("politician")
>>> {'n': {'politician', 'politics', 'politicians'},
     'a': {'political'},
     'v': set(),
     'r': {'politically'}}
>>> get_word_forms("am")
>>> {'n': {'being', 'beings'},
     'a': set(),
     'v': {'was', 'be', "weren't", 'am', "wasn't", "aren't", 'being', 'were', 'is', "isn't", 'been', 'are', 'am not'},
     'r': set()}
>>> get_word_forms("ran")
>>> {'n': {'run', 'runniness', 'runner', 'runninesses', 'running', 'runners', 'runnings', 'runs'},
     'a': {'running', 'runny'},
     'v': {'running', 'run', 'ran', 'runs'},
     'r': set()}
>>> get_word_forms('continent', 0.8) # with configurable similarity threshold
>>> {'n': {'continents', 'continency', 'continences', 'continent', 'continencies', 'continence'},
     'a': {'continental', 'continent'},
     'v': set(),
     'r': set()}

As you can see, the output is a dictionary with four keys. "r" stands for adverb, "a" for adjective, "n" for noun and "v" for verb. Don't ask me why "r" stands for adverb. This is what WordNet uses, so this is why I use it too :-)

Help can be obtained at any time by typing the following:

>>> help(get_word_forms)

Why?

In Natural Language Processing and Search, one often needs to treat words like "run" and "ran", "love" and "lovable" or "politician" and "politics" as the same word. This is usually done by algorithmically reducing each word into a base word and then comparing the base words. The process is called Stemming. For example, the Porter Stemmer reduces both "love" and "lovely" into the base word "love".

Stemmers have several shortcomings. Firstly, the base word produced by the Stemmer is not always a valid English word. For example, the Porter Stemmer reduces the word "operation" to "oper". Secondly, the Stemmers have a high false negative rate. For example, "run" is reduced to "run" and "ran" is reduced to "ran". This happens because the Stemmers use a set of rational rules for finding the base words, and as we all know, the English language does not always behave very rationally.

Lemmatizers are more accurate than Stemmers because they produce a base form that is present in the dictionary (also called the Lemma). So the reduced word is always a valid English word. However, Lemmatizers also have false negatives because they are not very good at connecting words across different parts of speeches. The WordNet Lemmatizer included with NLTK fails at almost all such examples. "operations" is reduced to "operation" and "operate" is reduced to "operate".

Word Forms tries to solve this problem by finding all possible forms of a given English word. It can perform verb conjugations, connect noun forms to verb forms, adjective forms, adverb forms, plularize singular forms etc.

Bonus: A simple lemmatizer

We also offer a very simple lemmatizer based on word_forms. Here is how to use it.

>>> from word_forms.lemmatizer import lemmatize
>>> lemmatize("operations")
'operant'
>>> lemmatize("operate")
'operant'

Enjoy!

Compatibility

Tested on Python 3

Installation

Using pip:

pip install -U word_forms

From source

Or you can install it from source:

  1. Clone the repository:
git clone https://github.com/gutfeeling/word_forms.git
  1. Install it using pip or setup.py
pip install -e word_forms
% or
cd word_forms
python setup.py install

Acknowledgement

  1. The XTAG project for information on verb conjugations.
  2. WordNet

Maintainer

Hi, I am Dibya and I maintain this repository. I would love to hear from you. Feel free to get in touch with me at dibyachakravorty@gmail.com.

Contributors

  • Tom Aarsen @CubieDev is a major contributor and is singlehandedly responsible for v2.0.0.
  • Sajal Sharma @sajal2692 ia a major contributor.

Contributions

Word Forms is not perfect. In particular, a couple of aspects can be improved.

  1. It sometimes generates non dictionary words like "runninesses" because the pluralization/singularization algorithm is not perfect. At the moment, I am using inflect for it.

If you like this package, feel free to contribute. Your pull requests are most welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

word_forms-2.1.0.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

word_forms-2.1.0-py3-none-any.whl (166.3 kB view details)

Uploaded Python 3

File details

Details for the file word_forms-2.1.0.tar.gz.

File metadata

  • Download URL: word_forms-2.1.0.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0.post20201207 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.5

File hashes

Hashes for word_forms-2.1.0.tar.gz
Algorithm Hash digest
SHA256 24df11f8db1a1305fa386689a080cfcd2e14458c4f31dde262c7938a45b87f74
MD5 6801c9a327ebdbdda03463254b1e2c23
BLAKE2b-256 3139e0f24b7c3f228561b346ae8c046817ff3d3929d77b0c3ca14a12e4d106b2

See more details on using hashes here.

Provenance

File details

Details for the file word_forms-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: word_forms-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 166.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0.post20201207 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.5

File hashes

Hashes for word_forms-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a693d2f6497af6006f350030ff67c925b2745de7a0dae6f0bb06075828cee83c
MD5 866efab19d61e4e740ae643a2d73bfb1
BLAKE2b-256 1b7f5c6cf433fff3ed696e366ed8fd6e4e6bafb9477be0c4e862510ee4d9b3f9

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page