Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.4.14.tar.gz (48.6 kB view details)

Uploaded Source

Built Distributions

dedupe-1.4.14-cp35-cp35m-macosx_10_9_x86_64.whl (51.0 kB view details)

Uploaded CPython 3.5m macOS 10.9+ x86-64

dedupe-1.4.14-cp34-cp34m-win_amd64.whl (51.7 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.4.14-cp34-cp34m-win32.whl (51.0 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.4.14-cp27-cp27m-win_amd64.whl (51.8 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.4.14-cp27-cp27m-win32.whl (51.0 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.4.14-cp27-cp27m-macosx_10_9_x86_64.whl (50.7 kB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file dedupe-1.4.14.tar.gz.

File metadata

  • Download URL: dedupe-1.4.14.tar.gz
  • Upload date:
  • Size: 48.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.4.14.tar.gz
Algorithm Hash digest
SHA256 708cda2d9bc7c0edcfc48d28206107cbb420a0c0d9f4670a67b3d0353b697d1c
MD5 35d400fe408dd6b9c5a60b568f3407ca
BLAKE2b-256 f309f2a72f8de19a9753ab3cc25f8c63d363c58f9272424aa7bca0d1e16e0486

See more details on using hashes here.

File details

Details for the file dedupe-1.4.14-cp35-cp35m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.14-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 adad09ef4dc78fa515e0ece3312090e0c45a2f1fc6314078104b81419bcd3393
MD5 104edaea30f89f3f8d571ae93e6a9202
BLAKE2b-256 5651a5a8b64ef610701633b2a54daad9224cc9318c03755a6362cac67bcf9217

See more details on using hashes here.

File details

Details for the file dedupe-1.4.14-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.14-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 baef3b6dd4537fcfa707ab6259026ff55efb620b18837e8c5516b2fdbaacf4fd
MD5 bac14693e6fbb9d4be9f0b36329af5e5
BLAKE2b-256 a64ef378876b7aa61ae903ce517caf8a066d84aba4d7925194dc9bb4ba487b4d

See more details on using hashes here.

File details

Details for the file dedupe-1.4.14-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.14-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 5870397257ec35660e384b948da67aa2ad8bdf6bd62b63f2fb86d714e40d8ed5
MD5 cb05597d8dd5b4804201423dfd317ff5
BLAKE2b-256 fd49cb5e8e4d6ef5b651ea799aa39c43beb3aa273e180c585c1dc5e17666f4e2

See more details on using hashes here.

File details

Details for the file dedupe-1.4.14-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.14-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 51112100df8f07f929f5d40f9bc3d219968c7bd52a97cb410150107c9e2d52b8
MD5 a9bd0929fdc4c91f7758f5cb0c9dfd68
BLAKE2b-256 becb234416fae9ccb6d25a8f1a4a199ae0d7eec647fc913f6345584bb958d764

See more details on using hashes here.

File details

Details for the file dedupe-1.4.14-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.14-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 18e1acd83dbfaa12505d782c933b220b746b3a0b7d9b4a51516610c4dd626479
MD5 4537255bcd1536d1875135360f08962d
BLAKE2b-256 e350d6aeb0dadded74904436c5688e2e135795e140dada4d31168ff25b71a9be

See more details on using hashes here.

File details

Details for the file dedupe-1.4.14-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.14-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 84d8bf4699fffb137ae276e792c433a43fe750069cfd2fc7a211157068327c47
MD5 d2ba7d1329c95c0a4f7f54d726685c93
BLAKE2b-256 12e38a03cd0caf77e3fa837c5dc4569d029bc8b612df58be3c782976935461f8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page