Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.4.13.tar.gz (48.6 kB view details)

Uploaded Source

Built Distributions

dedupe-1.4.13-cp35-cp35m-macosx_10_9_x86_64.whl (50.9 kB view details)

Uploaded CPython 3.5m macOS 10.9+ x86-64

dedupe-1.4.13-cp34-cp34m-win_amd64.whl (51.6 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.4.13-cp34-cp34m-win32.whl (51.0 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.4.13-cp27-cp27m-win_amd64.whl (51.8 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.4.13-cp27-cp27m-win32.whl (50.9 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.4.13-cp27-cp27m-macosx_10_9_x86_64.whl (50.6 kB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file dedupe-1.4.13.tar.gz.

File metadata

  • Download URL: dedupe-1.4.13.tar.gz
  • Upload date:
  • Size: 48.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.4.13.tar.gz
Algorithm Hash digest
SHA256 e09ba09f4d820a0318a3361194fe5b42b7d1339040a6ae6337dad4036c9ae57c
MD5 a3622e1515979ee231fc1e36ee968f31
BLAKE2b-256 1cb1f1e0a70d4608675b0270a6dfc9281dba7185261c3b035aacac7ac1dda637

See more details on using hashes here.

File details

Details for the file dedupe-1.4.13-cp35-cp35m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.13-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 2bd598ebdbd4660efa2c3c504cffe64ecacbb85e4040e0aabfaaaf15a73928cc
MD5 feb73ee5503bd27aad1f57dbe6d624de
BLAKE2b-256 09377045b36d5a7b1d154598c4d9168f0503ee944c86dcb31f31c2f3305c116d

See more details on using hashes here.

File details

Details for the file dedupe-1.4.13-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.13-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 37ec66be3f5ba86f95accfa22c600a3729c35c1f7e2169a5f9f10c0a462a7942
MD5 d0b8d6318dea8b2c810a26c6007f26e0
BLAKE2b-256 77de89e2e9757ac60f3af89ffc7d568c23f63f23d51e560cb15eea67cdeb4b18

See more details on using hashes here.

File details

Details for the file dedupe-1.4.13-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.13-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 a0a13786722fb2d3a2e9c9110cceaa5ce64543d82491e37cf1d52c08909f4ddd
MD5 03df0aa1c4c798ee7063e9be02b7f411
BLAKE2b-256 a202ce3a40071fbc8122bbb516f234e0c651608e91f385f02d4cf2b93ef11938

See more details on using hashes here.

File details

Details for the file dedupe-1.4.13-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.13-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 242f762a6ddd72ee373a03537f43910b1a26bccfd2638b1966116c66c3202d1c
MD5 623f3089c9937a5149fb4afec89e9e10
BLAKE2b-256 edb111758da7589aad82a3c48487483a509c6a47c83a7ef85c180991b41701f7

See more details on using hashes here.

File details

Details for the file dedupe-1.4.13-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.13-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 7f1d832d08fc9c1f966c9b5b28c228a3e7af82f802a3eec94e3d701edd7e8a9e
MD5 82c3f9dfe82caf57e32c7351e56ebea5
BLAKE2b-256 86da8afd1708aaf22601a678783d14f3728e1a1a7ec9c84c021b7f24cf74728e

See more details on using hashes here.

File details

Details for the file dedupe-1.4.13-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.13-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 a77dab239d0e6f7861b8ff65eca64604a04aaaad126d1bd26be87fa144e46d50
MD5 85d44fbe3b9793b900e40b8879d00338
BLAKE2b-256 09dfe97f6cf88c1f62a500ad2c6191494467cf0cdf3045f4bf7f157f16f0a129

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page