Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.4.6.tar.gz (47.3 kB view details)

Uploaded Source

Built Distributions

dedupe-1.4.6-cp35-cp35m-macosx_10_9_x86_64.whl (50.2 kB view details)

Uploaded CPython 3.5m macOS 10.9+ x86-64

dedupe-1.4.6-cp34-cp34m-win_amd64.whl (51.0 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.4.6-cp34-cp34m-win32.whl (50.3 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.4.6-cp27-cp27m-win_amd64.whl (51.1 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.4.6-cp27-cp27m-win32.whl (50.2 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.4.6-cp27-cp27m-macosx_10_9_x86_64.whl (49.9 kB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file dedupe-1.4.6.tar.gz.

File metadata

  • Download URL: dedupe-1.4.6.tar.gz
  • Upload date:
  • Size: 47.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.4.6.tar.gz
Algorithm Hash digest
SHA256 8517aa799a1beec2289996156842ff3326a8a17e166ce9f36aa51685c29733f3
MD5 3f1b92f541e9baedd47d61d6f6e077e3
BLAKE2b-256 690e2917d6049ae5312e06a392aba7e4181689c715e2676f2eb9cb38728ad2dd

See more details on using hashes here.

File details

Details for the file dedupe-1.4.6-cp35-cp35m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.6-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 3af0755ba52376f226b73d9737db97a7041f0c972d5275ed8cf52875aefa36ac
MD5 9891d8600fb2dcfa74e20fb735293b3c
BLAKE2b-256 f9facead8afc460ef00d6dd02f6297b69d63cd07338fb8a7a43970d7529f604f

See more details on using hashes here.

File details

Details for the file dedupe-1.4.6-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.6-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 2ecdbdfcd7edb5727adf4c0dbbc6c14b6e7946287cdae7aa9b4ab98d06138868
MD5 f0978c204b484067453dec3bcd47ebb9
BLAKE2b-256 e35f1664d50863fb47afa3c333b5e45b6dbe89d34831a536f28b7909eda4ea84

See more details on using hashes here.

File details

Details for the file dedupe-1.4.6-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.6-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 c80072df28e4984a27404050f8989deb86f05290fe2fd1083a85c9b792c1b175
MD5 035567f39d8fa5d603ffb5e3a10a2a37
BLAKE2b-256 26cb385bbab638f0bb26e26b64d9025c5c4d637482bc129210b5b0319b78f701

See more details on using hashes here.

File details

Details for the file dedupe-1.4.6-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.6-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 a8f5c87ec9c7884a3735a549df5d4de04e6088fa92b7362db6e7748d04f9a737
MD5 f224937064b2387dc377a8a031bbcec3
BLAKE2b-256 bc949b81312ad8325a84a2c0286d1987d41328afddb188fd76149564af431dbc

See more details on using hashes here.

File details

Details for the file dedupe-1.4.6-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.6-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 b5d0b7740c8bf63ca9c9ec0c9ff9842d5208c4f8399bd8d01ec49193f7924d43
MD5 cd9cc4eb25e064a1103bb5d4e4624969
BLAKE2b-256 bcdf35487c1557db6d36ba9b0f48bb7c6f5db2002ddf20fc84e780be249f9f1c

See more details on using hashes here.

File details

Details for the file dedupe-1.4.6-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.6-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 aa13327981042a25310944d43997e2f6936092924fbac1a11e739af27c560aec
MD5 9bd065f96f1ffac9361655b60b43424e
BLAKE2b-256 bc5cd1f8ea011a5bbdf255d75ddcff2a54dc1899d851b9da4c4826b18578640a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page