Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data. dedupe is the open source engine for dedupe.io

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-2.0.7.tar.gz (67.9 kB view details)

Uploaded Source

Built Distributions

dedupe-2.0.7-cp39-cp39-win_amd64.whl (65.1 kB view details)

Uploaded CPython 3.9 Windows x86-64

dedupe-2.0.7-cp39-cp39-macosx_10_14_x86_64.whl (60.8 kB view details)

Uploaded CPython 3.9 macOS 10.14+ x86-64

dedupe-2.0.7-cp38-cp38-win_amd64.whl (65.1 kB view details)

Uploaded CPython 3.8 Windows x86-64

dedupe-2.0.7-cp38-cp38-manylinux1_x86_64.whl (91.0 kB view details)

Uploaded CPython 3.8

dedupe-2.0.7-cp38-cp38-macosx_10_14_x86_64.whl (60.8 kB view details)

Uploaded CPython 3.8 macOS 10.14+ x86-64

dedupe-2.0.7-cp37-cp37m-win_amd64.whl (65.1 kB view details)

Uploaded CPython 3.7m Windows x86-64

dedupe-2.0.7-cp37-cp37m-manylinux1_x86_64.whl (90.6 kB view details)

Uploaded CPython 3.7m

dedupe-2.0.7-cp37-cp37m-macosx_10_14_x86_64.whl (60.8 kB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

dedupe-2.0.7-cp36-cp36m-win_amd64.whl (65.0 kB view details)

Uploaded CPython 3.6m Windows x86-64

dedupe-2.0.7-cp36-cp36m-manylinux1_x86_64.whl (89.5 kB view details)

Uploaded CPython 3.6m

dedupe-2.0.7-cp36-cp36m-macosx_10_14_x86_64.whl (60.7 kB view details)

Uploaded CPython 3.6m macOS 10.14+ x86-64

File details

Details for the file dedupe-2.0.7.tar.gz.

File metadata

  • Download URL: dedupe-2.0.7.tar.gz
  • Upload date:
  • Size: 67.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.9

File hashes

Hashes for dedupe-2.0.7.tar.gz
Algorithm Hash digest
SHA256 f5af08f52700f34545b1458716dffb96cfa898467483dead07355e2e6618cfca
MD5 5574c4875c2d1d6bc60b1eb6b87aff20
BLAKE2b-256 936fa8689811a5df73151ed122c98c26db55cefa9590ddde2ab414ea525ab252

See more details on using hashes here.

File details

Details for the file dedupe-2.0.7-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.7-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 65.1 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.1

File hashes

Hashes for dedupe-2.0.7-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 8e53ceae7c90540591fce3b09fa0c2f796785babb54629a266736f535b45a8ac
MD5 3745f84d3e11f57f3f8145d413c3dd04
BLAKE2b-256 3c7b5be580863af40cc76ce81d8b390bd88146f5e365d2204198a0a6ca8e9161

See more details on using hashes here.

File details

Details for the file dedupe-2.0.7-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.7-cp39-cp39-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 60.8 kB
  • Tags: CPython 3.9, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.2

File hashes

Hashes for dedupe-2.0.7-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 7f4a5481a4250e2264c7d93b41626c555c1c1b7343d78bca82c1f93f6d9bb35e
MD5 f39d132e231cbe53af8b906af651a7b3
BLAKE2b-256 d394e37a56c88280ae50f8aaab8ad66d5eb2488e03fb220f571769d7678fcdf0

See more details on using hashes here.

File details

Details for the file dedupe-2.0.7-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.7-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 65.1 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.7

File hashes

Hashes for dedupe-2.0.7-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 841db308eddc57c2552ceadaa6828481c472c40ffd2526f3159912b2083c666b
MD5 985528e37250d00fbfceb5418db78e54
BLAKE2b-256 4260649b8f8784f28633aeadc5b7f3612d2d1718f866c7276fdae8bf954cd149

See more details on using hashes here.

File details

Details for the file dedupe-2.0.7-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.7-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 91.0 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.9

File hashes

Hashes for dedupe-2.0.7-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 fec3e8847cd962785243e3d754178ec9aa532548dd5286eea141c04fd7ad1fb4
MD5 1087cce50980b9cd299a01536a34cce4
BLAKE2b-256 44dfa3dc0bd815aa90f272cf41ae54e912958eb3ca7109ca19940b3f8fa14515

See more details on using hashes here.

File details

Details for the file dedupe-2.0.7-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.7-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 60.8 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for dedupe-2.0.7-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 726d6799bd812c82400a0cc46d23eca1dfd60be32631dee57f4b8e29933fd5c3
MD5 298c47fc1cbabb94ebb4c7e0cfb21c2e
BLAKE2b-256 09ec274607c8e7d6c5553f45467b48d19e802c977ad648f3171916a700b35cbc

See more details on using hashes here.

File details

Details for the file dedupe-2.0.7-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.7-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 65.1 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.7.9

File hashes

Hashes for dedupe-2.0.7-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 04b891fc67dc32c93fab08eea4ef63664a91f14e1944176c05722e1926bd3546
MD5 5e2b49ffed0b67d32446f552f59c65ec
BLAKE2b-256 cdf4b49ee4e73b296d81cd8b7931c2147445883b44d77448bdd6b5701baff5a3

See more details on using hashes here.

File details

Details for the file dedupe-2.0.7-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.7-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 90.6 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.9

File hashes

Hashes for dedupe-2.0.7-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 64cc04fafc6aed7fda4bca5ea20086555ffc71c6ed93af69b590396771dbd0ed
MD5 2dcad0ef9aa55f9ca1dac9b13ec223a9
BLAKE2b-256 a7bff0465494ee34b83243fb634916ab4f896b80d0bb90f6ee54287c973bd77e

See more details on using hashes here.

File details

Details for the file dedupe-2.0.7-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.7-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 60.8 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.7.10

File hashes

Hashes for dedupe-2.0.7-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 58cabad5591c4139d172ae43592b0838b6c02a721026702d4491087ac75a3475
MD5 77673ca0d650df6307ceddcaef0c4e09
BLAKE2b-256 d08c95be709f07c172b34ea11be363937e08da53b9b5cf7e71e171bf8aa5ad54

See more details on using hashes here.

File details

Details for the file dedupe-2.0.7-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.7-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 65.0 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.6.8

File hashes

Hashes for dedupe-2.0.7-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 93e42a776a8b642adae07ebbcb2f53141af8b6cfc73479758b0dc750ef269c4d
MD5 2e2d4f98a55a15255a3c82631eefe337
BLAKE2b-256 8694a0d676d3e64977c20c54a890cd2ed902cb91ac129aed16de9e5776b55f4d

See more details on using hashes here.

File details

Details for the file dedupe-2.0.7-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.7-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 89.5 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.9

File hashes

Hashes for dedupe-2.0.7-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 74e87aa9f3d66444fff438ee4dbc2630d7500564d6ece3ac50a2e4dd0cc10c4f
MD5 2484137ccb68e208fd835eba4743c59c
BLAKE2b-256 55a09201a7a2104fb3adcee1a20d13ab2eaac1494d7bde8cd4361cdcf493bd99

See more details on using hashes here.

File details

Details for the file dedupe-2.0.7-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.7-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 60.7 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.6.13

File hashes

Hashes for dedupe-2.0.7-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 5a82e7912f019f765af00d13c64fb855a541bfc0cba925f6ea04ffe03187c50f
MD5 e9815819c64147f658a680ca413d4cfe
BLAKE2b-256 0d6f0fe9c90ec8562f4e32532e28c11d989c953ff93cee9d0555ac61650ec383

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page