Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.4.4.tar.gz (46.9 kB view details)

Uploaded Source

Built Distribution

dedupe-1.4.4-cp27-cp27m-macosx_10_9_x86_64.whl (49.3 kB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file dedupe-1.4.4.tar.gz.

File metadata

  • Download URL: dedupe-1.4.4.tar.gz
  • Upload date:
  • Size: 46.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.4.4.tar.gz
Algorithm Hash digest
SHA256 d7f3d15de20b91f5b3fdb30455f304f37539338ae4c32fbd6e570be0845507d3
MD5 c3fa913941bd5bc1006352887c6c656c
BLAKE2b-256 f4b67c0c0b1a72ef7982e6a79db7d980d9d7119cef8943ccb578e9631a123f7d

See more details on using hashes here.

File details

Details for the file dedupe-1.4.4-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.4-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 0677a6e6292057c7549989fe69a7b6ad13a23a73a535dd161314a6feb384221d
MD5 76138ff7eb8fe4399c295048ec1ee378
BLAKE2b-256 c18ddb53fb3c1c2dbdf4bb748af01d47c7ef4cd4afafd1db9871b56ec145a787

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page