Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.4.10.tar.gz (47.6 kB view details)

Uploaded Source

Built Distributions

dedupe-1.4.10-cp35-cp35m-macosx_10_9_x86_64.whl (49.9 kB view details)

Uploaded CPython 3.5m macOS 10.9+ x86-64

dedupe-1.4.10-cp34-cp34m-win_amd64.whl (50.7 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.4.10-cp34-cp34m-win32.whl (50.0 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.4.10-cp27-cp27m-win_amd64.whl (50.8 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.4.10-cp27-cp27m-win32.whl (49.9 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.4.10-cp27-cp27m-macosx_10_9_x86_64.whl (49.6 kB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file dedupe-1.4.10.tar.gz.

File metadata

  • Download URL: dedupe-1.4.10.tar.gz
  • Upload date:
  • Size: 47.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.4.10.tar.gz
Algorithm Hash digest
SHA256 551da3168e79687287b440282579ea49bd542ff7758c2e93b9e64888d9ce20f4
MD5 3ce2d59618bd93623c66191f866bb39f
BLAKE2b-256 2e6ab7f77d50ddc76b4af18634e18bdc718f4aab15195a8f59318259f587dfa5

See more details on using hashes here.

File details

Details for the file dedupe-1.4.10-cp35-cp35m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.10-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1d6f5a37f0e39389601d88a84784d753cdda460205b9a863246c012b88a1062f
MD5 6ef78452a008c852ba153816a893ecff
BLAKE2b-256 0eecdedf5e100f70046f61349d2b69a7eb9926057b2ad8954a1c35d0cf6e89fe

See more details on using hashes here.

File details

Details for the file dedupe-1.4.10-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.10-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 54003aee8322ef5ebc98fe55cfe44235cc104dc5310f0f0016df82794313af9b
MD5 c9ee79eab1367745c443863a9487dc24
BLAKE2b-256 d8437ac54cb66f150d9f6be14b3e6f365a3121364c77f81c8a3cc875401c0986

See more details on using hashes here.

File details

Details for the file dedupe-1.4.10-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.10-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 4b51b40f0e8e6d3e90fec12a8ef455ede2db93baddbbd5c9473c1c7ad38c7805
MD5 c8e8530d4d70051d870f365e16753020
BLAKE2b-256 eb65b138063c57e5ca97dd1174e22e919dad3ccbbefef7c1bee51ad3c8f8c75a

See more details on using hashes here.

File details

Details for the file dedupe-1.4.10-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.10-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 5d5b1be2518d2804b0b5a1aea96074788dcb4ed7f19020e32820fca7cbc02273
MD5 64bfee70488b26cb13419ac3ae684fc9
BLAKE2b-256 1a7efbbacbb5f02eff1cae1aceed64e1360054a3a9b2099981471bcaa19017ff

See more details on using hashes here.

File details

Details for the file dedupe-1.4.10-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.10-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 19c181a5fb1ae57e4c6145a85732ced4ebf682d3b71a35b0a6d3b273ebcb1f5e
MD5 378f039ca065264446c73865c9a4a5e0
BLAKE2b-256 c3b6903cc1161a79695d0bdeed9337a6eb05c2ba2a37420a2440432d72d64cad

See more details on using hashes here.

File details

Details for the file dedupe-1.4.10-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.10-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 12db8dfaefef4c2429749196b9242a0774a1d38c72fd6e7b42ddc67311ddfc29
MD5 33309ac26d779f19140c457787ae582c
BLAKE2b-256 1b531859e68d41ab37ebef02fd1768915a9f64aa25bf8e7d9121e4192fefeb88

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page