Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data. dedupe is the open source engine for dedupe.io

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

dedupe-1.7.2-cp34-cp34m-win_amd64.whl (51.4 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.7.2-cp34-cp34m-win32.whl (50.7 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.7.2-cp27-cp27m-win_amd64.whl (51.6 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.7.2-cp27-cp27m-win32.whl (50.7 kB view details)

Uploaded CPython 2.7m Windows x86

File details

Details for the file dedupe-1.7.2-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.7.2-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 b37570aadb950670c6b7f912c6aba440261fea0106821ef750b41d9b2b04bc2e
MD5 089ae56c6082b11e414b70595f41ba03
BLAKE2b-256 2ec263ea6dd6d1dbf12e52a253780923eabb69f1462767326b2b094e39df3100

See more details on using hashes here.

File details

Details for the file dedupe-1.7.2-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.7.2-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 19d90e93403487e03976575efb2d3bd1de93d89f1b741abd3c0505c1e5e517ae
MD5 fd1d203a429b4a035e5cf039b7d9b0a0
BLAKE2b-256 a088b0d969e6ada4776ce1c0282aa6f3a105d067841ac2e8eaee24b12b3ce0cd

See more details on using hashes here.

File details

Details for the file dedupe-1.7.2-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.7.2-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 d83ac26b6b3a541038e681c4f4efafc4ce3b916847e25d75309bdd08d962d66b
MD5 4ef4170967320b16e91bc66563a48777
BLAKE2b-256 9cbd94154036300b18ff61de94645cc6416bf837837a58d21e171e38db5bfd48

See more details on using hashes here.

File details

Details for the file dedupe-1.7.2-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.7.2-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 5d45d1fceaf53e30d7a8a88d782f0e6c6864a4c7b07ce86448a09590fc71639c
MD5 8039a88abd7053c4a8d2b166a7d1b525
BLAKE2b-256 03acfd5d84885e99786b135f9344ce233773e43d798e9d363a1b5bccf620e2d0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page