Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.4.11.tar.gz (47.6 kB view details)

Uploaded Source

Built Distributions

dedupe-1.4.11-cp35-cp35m-macosx_10_9_x86_64.whl (49.9 kB view details)

Uploaded CPython 3.5m macOS 10.9+ x86-64

dedupe-1.4.11-cp34-cp34m-win_amd64.whl (50.7 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.4.11-cp34-cp34m-win32.whl (50.0 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.4.11-cp27-cp27m-win_amd64.whl (50.8 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.4.11-cp27-cp27m-win32.whl (49.9 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.4.11-cp27-cp27m-macosx_10_9_x86_64.whl (49.6 kB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file dedupe-1.4.11.tar.gz.

File metadata

  • Download URL: dedupe-1.4.11.tar.gz
  • Upload date:
  • Size: 47.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.4.11.tar.gz
Algorithm Hash digest
SHA256 a9186aabf64b7c4f428f46f736a436b71191ac4522d5391f55d8933c749f2f16
MD5 727f7c3242346055d14a01df610443a2
BLAKE2b-256 10df666a33ed61f151e61e01171859b66a0075dc61d4a45722721e4cc49af73c

See more details on using hashes here.

File details

Details for the file dedupe-1.4.11-cp35-cp35m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.11-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d619fe99be6a6ad3fa8a1017843c20c188fb4b863b33af8863001621a3be78dd
MD5 7bad26d79a33a493343cc39c9ba839c2
BLAKE2b-256 961b0baca76c130a064c46c57692b164ce5d70efe02c2d30ba6330a53fd69dee

See more details on using hashes here.

File details

Details for the file dedupe-1.4.11-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.11-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 10cd42570b7f9f9a0867809f1cb7c66fbe255312aebc9248f3d898255e149352
MD5 59bbba479758231afa4ef54f19691143
BLAKE2b-256 fc82792cc304405c8dac4ffc519ca98ea72f37efa023eb58375628a9740c1298

See more details on using hashes here.

File details

Details for the file dedupe-1.4.11-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.11-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 fc69ddf56bf230102af3d1080872cde220507677c2bb28092b05149bedd53edf
MD5 07734784351b5bd3958ff0f41e966cbd
BLAKE2b-256 2470a2d5d7fb6d7eb1d78673c5b81049f0898334a695d0bc0ae09bb42b40c50f

See more details on using hashes here.

File details

Details for the file dedupe-1.4.11-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.11-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 26f89d820a70e993b6832f39cce56b1f473b83b0b72db58ae50bba924a0ce696
MD5 65b4602987bb3aa7d25ef7005009afd3
BLAKE2b-256 d71ed5ad85fd1899c098b47f1d1a3dcef28a91c783ce5dd8b8ba49adf5a2bb58

See more details on using hashes here.

File details

Details for the file dedupe-1.4.11-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.11-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 da42051d443310326566db9d63c9653891f435a3086efe7d478caef8bb90adb3
MD5 750041e5498c9470f9b096a39e0c55b9
BLAKE2b-256 361e29907c35987e3ef0395dc61c018b952c7219e7857887cdd3d0024b171ef6

See more details on using hashes here.

File details

Details for the file dedupe-1.4.11-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.11-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 46a5de8f0d1f9b31df9609c4be7cd2ecc0c2aa7f6d8f5c58725293da63a273cf
MD5 68d774e17f9caac5d2418ea5d115185f
BLAKE2b-256 f7972c44169284a3184a2acc50eb3b228197bd9889b59dff2e01040ff396530b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page