Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.5.2.tar.gz (47.6 kB view details)

Uploaded Source

Built Distributions

dedupe-1.5.2-cp34-cp34m-win_amd64.whl (50.3 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.5.2-cp34-cp34m-win32.whl (49.6 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.5.2-cp27-cp27m-win_amd64.whl (50.4 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.5.2-cp27-cp27m-win32.whl (49.6 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.5.2-cp27-cp27m-macosx_10_11_x86_64.whl (49.2 kB view details)

Uploaded CPython 2.7m macOS 10.11+ x86-64

File details

Details for the file dedupe-1.5.2.tar.gz.

File metadata

  • Download URL: dedupe-1.5.2.tar.gz
  • Upload date:
  • Size: 47.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.5.2.tar.gz
Algorithm Hash digest
SHA256 4ba6703a38e140ef68eedffaa8a076dbc520404694e621f4964994bcbec310e4
MD5 7dc2e75f808e4981236d66b1b6a14e3c
BLAKE2b-256 ed9620bc33fcdb5974728428fe97e535c1db11b7fb13e7d8d61e29b7ea189216

See more details on using hashes here.

File details

Details for the file dedupe-1.5.2-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.2-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 dcd8443367a9b8db24c955c7a0fc19e908557930c8ddf204e4996733838375be
MD5 cca6240e062406d21618343f0a4fa235
BLAKE2b-256 b2697d1e33521b1849bdfafb3adb1b8ed897a22f8164de1505930824d060f90b

See more details on using hashes here.

File details

Details for the file dedupe-1.5.2-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.5.2-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 b20656eb579b96764cba1cdefe5137086ce8fe818a27b958eb7807a5ea075368
MD5 e4516a3b0bac455598c19d42b838b3fd
BLAKE2b-256 e4645d1cbc8d1f1b8411e6ab2ce4fcdcb9bdd5ac4d045a2bdabd61c28e8422ce

See more details on using hashes here.

File details

Details for the file dedupe-1.5.2-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.2-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 0d7608185d521bdb4c6a71ad7448e6edbec53f0bd0307772e6b0a7938aa98986
MD5 0d8a2e3ce6adeb7ef231af6f4457f5bc
BLAKE2b-256 7093169fcf4ba111156ed17ec03a27f78f30b40fecd9e65d091cc6496e2e8c3a

See more details on using hashes here.

File details

Details for the file dedupe-1.5.2-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.5.2-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 653725d68f4a4f93ef918069bb80577dcf128404f33187699e4bf35247e1d81f
MD5 2a3f677b7aa882c2700a5ee40a85072b
BLAKE2b-256 cd5078ced57c74c3393f32033d05ccca80a0fb1443edde77b3362b409f46d6fc

See more details on using hashes here.

File details

Details for the file dedupe-1.5.2-cp27-cp27m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.2-cp27-cp27m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 ec75dbeca2be64126970cc9467febe18c828c395b2784ee6d7758615d26c3072
MD5 ef216a0d6e9cf64c18d5bc7745bea4d1
BLAKE2b-256 079ba3b4606f402b4fcd22748a8c4674ca6b672498806fee52c43d379ba9241c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page