Skip to main content

A Python implementation of the metaphone and double metaphone algorithms.

Project description

About

A Python implementation of the Metaphone and Double Metaphone algorithms

Metaphone

As described on the Wikipedia page, the original Metaphone algorithm was published in 1990 as an improvement over the Soundex algorithm. Like Soundex, it was limited to English-only use. The Metaphone algorithm does not produce phonetic representations of an input word or name; rather, the output is an intentionally approximate phonetic representation. The approximate encoding is necessary to account for the way speakers vary their pronunciations and misspell or otherwise vary words and names they are trying to spell.

Double Metaphone

The Double Metaphone phonetic encoding algorithm is the second generation of the Metaphone algorithm. Its implementation was described in the June 2000 issue of C/C++ Users Journal. It makes a number of fundamental design improvements over the original Metaphone algorithm.

It is called “Double” because it can return both a primary and a secondary code for a string; this accounts for some ambiguous cases as well as for multiple variants of surnames with common ancestry. For example, encoding the name “Smith” yields a primary code of SM0 and a secondary code of XMT, while the name “Schmidt” yields a primary code of XMT and a secondary code of SMT–both have XMT in common.

Double Metaphone tries to account for myriad irregularities in English of Slavic, Germanic, Celtic, Greek, French, Italian, Spanish, Chinese, and other origin. Thus it uses a much more complex ruleset for coding than its predecessor; for example, it tests for approximately 100 different contexts of the use of the letter C alone.

History

This is a copy of the Python Double Metaphone algorithm, taken from Andrew Collins’ work, a Python implementation of an algorithm in C originally created by Lawrence Philips. Since then, improvements have been made by several contributors, viewable in the git history.

A resources directory is included with this project which contains the following:

  • the original C++ file by Lawrence Philips

  • Kevin Atkinson’s improvements to it

  • a C implementation (for use in a Perl extension) by Maurice Aubrey

The contributors of the Python version, originally started by Andrew Collins include:

  • Andrew Collins

  • Chris Leong

  • Matthew Somerville

  • Richard Barran

  • Maximillian Dornseif

  • Sebastien Metrot

  • Duncan McGreggor

  • Ollie Bennett

  • Ian Beaver

  • Alastair Houghton

Usage

Running the Unit Tests

metaphone uses the unittest package from the standard library, and as such, its tests are runnable by most test runners. If you have nose installed, you can do the following:

$ git clone https://github.com/oubiwann/metaphone.git
$ cd metaphone
$ nosetests -v .

If you have Twisted installed, you can do:

$ trial ./metaphone

Example Code

The unit tests are full of examples, so be sure to check those out. But here’s a taste:

$ python
>>> from metaphone import doublemetaphone
>>> doublemetaphone("architect")
(u"ARKTKT", u"")
>>> doublemetaphone("bajador")
(u"PJTR", u"PHTR")
>>> doublemetaphone("Τι είναι το Unicode;")
(u'NKT', u'')

In the Wild

The following developers/projects make use of this library:

  • Andrew Collins used his original code in various music projects and dealing with misspelled text from data provided by various web services. This was then integrated with Plone/Zope projects.

  • Matthew Somerville uses it on Theatricalia to do people name matching, and it appears to work quite well. The database stores the double metaphones for first and last names, and then upon searching simply computes the double metaphones of what has been entered and looks up anything that matches.

  • Duncan McGreggor uses it on the φarsk project to provide greater full text search capabilities for Indo-European language word lists and dictionaries.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Metaphone-0.6.tar.gz (14.1 kB view details)

Uploaded Source

File details

Details for the file Metaphone-0.6.tar.gz.

File metadata

  • Download URL: Metaphone-0.6.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for Metaphone-0.6.tar.gz
Algorithm Hash digest
SHA256 ad0beadca66cb7ec6ede71ef72bb02da097c493ddf159930d6340bc83f53da27
MD5 81d319c20720bd0a1d2e8529002caf06
BLAKE2b-256 d4aec9e4d007e32a6469be212da11d0b8e104d643f6f247d771742caf6ac6bb8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page