Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.4.5.tar.gz (48.0 kB view details)

Uploaded Source

Built Distributions

dedupe-1.4.5-cp35-cp35m-macosx_10_9_x86_64.whl (50.2 kB view details)

Uploaded CPython 3.5m macOS 10.9+ x86-64

dedupe-1.4.5-cp34-cp34m-win_amd64.whl (51.0 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.4.5-cp34-cp34m-win32.whl (50.3 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.4.5-cp27-cp27m-win_amd64.whl (51.1 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.4.5-cp27-cp27m-win32.whl (50.2 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.4.5-cp27-cp27m-macosx_10_9_x86_64.whl (49.9 kB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file dedupe-1.4.5.tar.gz.

File metadata

  • Download URL: dedupe-1.4.5.tar.gz
  • Upload date:
  • Size: 48.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.4.5.tar.gz
Algorithm Hash digest
SHA256 77b7d78d184ebb9c759453ca088bae776983d02b925ffcd7b07cb33167b7b52c
MD5 ced19e223d01398439208f78b2f3c00b
BLAKE2b-256 2706f2245d22d66f4bcd1bb988d2375ca93032c3d8aaa8ea7b3cfba20735e151

See more details on using hashes here.

File details

Details for the file dedupe-1.4.5-cp35-cp35m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.5-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 aee6b1c0ce325892be9efdea23093a2d07a97a1e16330eb4d578b688f92a1b56
MD5 741d11b9616b092772b2980f07041d79
BLAKE2b-256 1d08ee9c3177ad92a2f9a222fe5a678ba46200a9dba457bfefd1c3469728363b

See more details on using hashes here.

File details

Details for the file dedupe-1.4.5-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.5-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 8aeaaf5b7f32b54b6de3f376bcd9ef9d84fce3b50b8418f70038f61725997dc4
MD5 212f90f85d836ea8999d2ad1e91724d4
BLAKE2b-256 291a7d3a6c7cc2ea86bc85844dd7132f1441eb505df3ec792bdde54742271cf8

See more details on using hashes here.

File details

Details for the file dedupe-1.4.5-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.5-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 52b1c6d27146a6b6644ea5e641ed64d3273cf40a83fe476515ae796dc8ece852
MD5 4150e27107ea4baac3e97b9b780b26fc
BLAKE2b-256 b983fdea9c820e1bbeab0e80eac59f781efbd02e54913b3777f068d7e237b875

See more details on using hashes here.

File details

Details for the file dedupe-1.4.5-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.5-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 93779773b5eea8bf81caa19b1ec4ad1c6d3a8b47a8a2a8ec383b6eddc888d8d7
MD5 3fee3052b33206b865f484f7ae6cc5c0
BLAKE2b-256 482ea62f92905543923f251eaeed44dd1baadc32992a41de1cf3372663f0263f

See more details on using hashes here.

File details

Details for the file dedupe-1.4.5-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.5-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 aa661c269124f4b99c95d23b73985ded0ac231bab4dd747f2cfc3efd247fe3a7
MD5 aa67cce2abb0fb299d733f2a140e741c
BLAKE2b-256 8d15e08e6158f55ba5216ffd4745f746c9bd6bf87dfc13f7d858b16dbd0c1a3f

See more details on using hashes here.

File details

Details for the file dedupe-1.4.5-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.5-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 fc7d69aafa002f6cb6aa183f0ee719d81f0cb0f6be80d1f6669cea64713700b1
MD5 6180721dfb3e0517816821a7af8f2257
BLAKE2b-256 1be5ee4f55b94cbdaffd29d565dd2949c73cfe11ca29576f10cd50d7aa6a0eb5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page