Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.5.0.tar.gz (46.8 kB view details)

Uploaded Source

Built Distributions

dedupe-1.5.0-cp34-cp34m-win_amd64.whl (50.3 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.5.0-cp34-cp34m-win32.whl (49.6 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.5.0-cp27-cp27m-win_amd64.whl (50.4 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.5.0-cp27-cp27m-win32.whl (49.6 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.5.0-cp27-cp27m-macosx_10_11_x86_64.whl (49.2 kB view details)

Uploaded CPython 2.7m macOS 10.11+ x86-64

File details

Details for the file dedupe-1.5.0.tar.gz.

File metadata

  • Download URL: dedupe-1.5.0.tar.gz
  • Upload date:
  • Size: 46.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.5.0.tar.gz
Algorithm Hash digest
SHA256 f8add2430d430df127f615238159494d0f908f52375f50e4c78b82b3cbe9986c
MD5 e5aac11e67815be8d659e93cbf18a954
BLAKE2b-256 d3b03bc74bd09a31b37580c340d72ea87d3a46024a5382d49607dabb40b6160f

See more details on using hashes here.

File details

Details for the file dedupe-1.5.0-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.0-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 5f8b8012f5ff8d2a453f3cdb4a5edcae60d115c38ddda8139dad892c9e728127
MD5 aeac0ec17dd183931879f92f065f3978
BLAKE2b-256 bcc2e34f739fab01e71e9305035199b18e7b14dc27b2a083e18a74702dc6a986

See more details on using hashes here.

File details

Details for the file dedupe-1.5.0-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.5.0-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 916fe0d7befb0ed8acc065ccf8e27700b74c250b881ab2e318d2579377faeaaa
MD5 9346a6eb477a790dc261d773d8e4f0b7
BLAKE2b-256 957017750ea4f6252d7a2d125a49e5ed69af9caf5290f145e465ccb943a23415

See more details on using hashes here.

File details

Details for the file dedupe-1.5.0-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.0-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 d0b29a46034119b8600d42848682d86e097cef4e699df1d8793b787885040559
MD5 6edaee6fcc7432fd72fefc6bf35b7b3a
BLAKE2b-256 7df178719712182e368e17985590eb02080c56db0183ffad454660bb7217447f

See more details on using hashes here.

File details

Details for the file dedupe-1.5.0-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.5.0-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 7b9539b5e808b68de4670e18ad6523f100e111033fdba5489d7d72d1df6c5cd7
MD5 73e1168eb91e832b8800b5423bd9900c
BLAKE2b-256 79366887efb97ac0fb0224144536d03d9f6138de3fe445e4ce98c66be400dfba

See more details on using hashes here.

File details

Details for the file dedupe-1.5.0-cp27-cp27m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.0-cp27-cp27m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 9a8e98e99aaf7fe5bc15b1983c99479ca2404db2d49c7db508550c3ac63e816b
MD5 9a85b5d65b82177b1c13338a863954d8
BLAKE2b-256 dad79d254dbb1c00e5c00924593becbf31ba6ec294bae6366b703922bb194151

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page