Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

This version

1.5.4

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.5.4.tar.gz (48.4 kB view details)

Uploaded Source

Built Distributions

dedupe-1.5.4-cp35-cp35m-macosx_10_11_x86_64.whl (49.6 kB view details)

Uploaded CPython 3.5m macOS 10.11+ x86-64

dedupe-1.5.4-cp34-cp34m-win_amd64.whl (50.4 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.5.4-cp34-cp34m-win32.whl (49.6 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.5.4-cp27-cp27m-win_amd64.whl (50.5 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.5.4-cp27-cp27m-win32.whl (49.6 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.5.4-cp27-cp27m-macosx_10_11_x86_64.whl (49.3 kB view details)

Uploaded CPython 2.7m macOS 10.11+ x86-64

File details

Details for the file dedupe-1.5.4.tar.gz.

File metadata

  • Download URL: dedupe-1.5.4.tar.gz
  • Upload date:
  • Size: 48.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.5.4.tar.gz
Algorithm Hash digest
SHA256 db3cf9ab497d55cfc15af1d72e395074f3510e6f915ecb2f14ba03f3a444e53e
MD5 8d2d6d1903f378c3f8517c4294dfe273
BLAKE2b-256 3e07a01049ad8c89363ffac819994ede1cdd948c3ce8a48043936c113dc65c77

See more details on using hashes here.

File details

Details for the file dedupe-1.5.4-cp35-cp35m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.4-cp35-cp35m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 de8dea277b6fb30041e1d7f0eef4ed2adbaf929945dd9c9cd3575bb59ddd0895
MD5 38e14a8c6cd78e69ca7751a4a841f8fd
BLAKE2b-256 d03c62a58e2b902173e9136b9a773f6f172b0ab630ca99a7c166394de33e7557

See more details on using hashes here.

File details

Details for the file dedupe-1.5.4-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.4-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 fda5572089e2e8080348adaa6a55955529909947144cc9e53241f4784c535c39
MD5 384bddf565e3fb607ad02bd1faac5988
BLAKE2b-256 2ba8ba9d21b4ac60eb0c555cff9abcb9f0c13dd9343d7ee07acdecb78ef27005

See more details on using hashes here.

File details

Details for the file dedupe-1.5.4-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.5.4-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 9e6948393095be1fc064e807eb994a6a28a471c69cdf9aeee2548a3089ff43e5
MD5 780b79947291a573876e8eb493176fa4
BLAKE2b-256 a7ec67d58f91cc3cfc29b7c62285c2f04d56497d62066efda65b820e896cd96d

See more details on using hashes here.

File details

Details for the file dedupe-1.5.4-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.4-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 ec344c2c1f9f1d2debfafaecce4aa54e2b35397af80753fb8bc156cc809d2519
MD5 896ae0ecaad116e10ed3f41a929bb325
BLAKE2b-256 fd8f0769be0437b0984d2c049738c2bd3ecac9d292cc4d4131c6567610fd8bdb

See more details on using hashes here.

File details

Details for the file dedupe-1.5.4-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.5.4-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 72684d45b1917e5b151f45788ae6d12274d5f38f145e2ef197f07329a3400eb2
MD5 42efd646a1f74dde862255b2b5f9f717
BLAKE2b-256 5d0eb597787fdebd25fbaea4eccd71155ce0eb86272074d96e63bd75537f323c

See more details on using hashes here.

File details

Details for the file dedupe-1.5.4-cp27-cp27m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.4-cp27-cp27m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 44897f1abdfbd57bb172a20f21c65f553a18a60c24dd2b0c6fe3048ff9fefa97
MD5 e346eb5e75312549d30b1175b302453a
BLAKE2b-256 7de669a3ab995dc236cc48902dd12c085ebdaa8551fe01afc84bf14d7e921be9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page