Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-0.8.0.0.8.tar.gz (46.8 kB view details)

Uploaded Source

Built Distributions

dedupe-0.8.0.0.8.linux-x86_64.tar.gz (107.3 kB view details)

Uploaded Source

dedupe-0.8.0.0.8-py2.7-linux-x86_64.egg (124.4 kB view details)

Uploaded Source

File details

Details for the file dedupe-0.8.0.0.8.tar.gz.

File metadata

  • Download URL: dedupe-0.8.0.0.8.tar.gz
  • Upload date:
  • Size: 46.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-0.8.0.0.8.tar.gz
Algorithm Hash digest
SHA256 c0097e8f8a4e94bd4b7395446d33ddaabdaf21028e5d28e036708cc99d459c36
MD5 6a12218e973c32705cb2c7c9bd27e9ef
BLAKE2b-256 a30fb4401f9313a62ce3f038e3f4b1d2b71e28173ad87c124f2e319cdff0ab21

See more details on using hashes here.

File details

Details for the file dedupe-0.8.0.0.8.linux-x86_64.tar.gz.

File metadata

File hashes

Hashes for dedupe-0.8.0.0.8.linux-x86_64.tar.gz
Algorithm Hash digest
SHA256 ce66dd07b8042d4e07e06b5d2d47867481ba151a3ec58448d4f277deca7ad1c4
MD5 7bf6e1c03cb1814fe19c8627e34bc5db
BLAKE2b-256 ace3d01f93c0e7014da645d917e16081730196e9c2d46ee2579f3d3509292a9a

See more details on using hashes here.

File details

Details for the file dedupe-0.8.0.0.8-py2.7-linux-x86_64.egg.

File metadata

File hashes

Hashes for dedupe-0.8.0.0.8-py2.7-linux-x86_64.egg
Algorithm Hash digest
SHA256 678fd0377a0a4228cbca967c062c46d797f7eb2c7b52af6d64e6ed9aae54ccc2
MD5 93d30294ab0300b1f54b81e26470a15b
BLAKE2b-256 fe24392a36c8a6cb1bd5b7895064909d5712ccde68a83bde4848af467d0a2f55

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page