Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data. dedupe is the open source engine for dedupe.io

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-2.0.3.tar.gz (67.0 kB view details)

Uploaded Source

Built Distributions

dedupe-2.0.3-cp38-cp38-win_amd64.whl (64.6 kB view details)

Uploaded CPython 3.8 Windows x86-64

dedupe-2.0.3-cp38-cp38-manylinux1_x86_64.whl (90.5 kB view details)

Uploaded CPython 3.8

dedupe-2.0.3-cp37-cp37m-win_amd64.whl (64.5 kB view details)

Uploaded CPython 3.7m Windows x86-64

dedupe-2.0.3-cp37-cp37m-manylinux1_x86_64.whl (90.1 kB view details)

Uploaded CPython 3.7m

dedupe-2.0.3-cp37-cp37m-macosx_10_14_x86_64.whl (60.4 kB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

dedupe-2.0.3-cp36-cp36m-win_amd64.whl (64.5 kB view details)

Uploaded CPython 3.6m Windows x86-64

dedupe-2.0.3-cp36-cp36m-manylinux1_x86_64.whl (89.0 kB view details)

Uploaded CPython 3.6m

dedupe-2.0.3-cp36-cp36m-macosx_10_14_x86_64.whl (60.3 kB view details)

Uploaded CPython 3.6m macOS 10.14+ x86-64

File details

Details for the file dedupe-2.0.3.tar.gz.

File metadata

  • Download URL: dedupe-2.0.3.tar.gz
  • Upload date:
  • Size: 67.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for dedupe-2.0.3.tar.gz
Algorithm Hash digest
SHA256 89a8cbfda6243ad86fa442d78b6dcf65135ddc5f8e64c5d8fc41da19dbcaedc6
MD5 116bfebbe4b2556793a0c5000bd74e4b
BLAKE2b-256 330577749e4d1600cf9155bfd46abfd29d10841accbd82f8942c6648ebef3875

See more details on using hashes here.

File details

Details for the file dedupe-2.0.3-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.3-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 64.6 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for dedupe-2.0.3-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 652746f8d0d978e384a9a196713c5072296a0dd0fc535e4e8bbec39f84eea71e
MD5 fa54ea0b644768f0992e2c8af010daee
BLAKE2b-256 88e30e2aabc973b3c9413a5994de67339de4d50615ad3227e07f607ab673dd10

See more details on using hashes here.

File details

Details for the file dedupe-2.0.3-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.3-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 90.5 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for dedupe-2.0.3-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 6ab58e13f16c36ab46a16272b21e0d50deea4cc0138db8998f7c6e8c2ad06507
MD5 0c7653867346f1d2dcac75973edc0237
BLAKE2b-256 8f2a4a9eeb1458616fe9e2d57a6dd17522bec569551056023978a7ba5e60e620

See more details on using hashes here.

File details

Details for the file dedupe-2.0.3-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.3-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 64.5 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for dedupe-2.0.3-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 89178c8f89df9ba43589feaf6a556037160010282347086552112a82c94d3aa8
MD5 d33b037002d512d5f1dbb5ef66d418ef
BLAKE2b-256 e39078d814c85fd516f8cffc116cbda3b28f69db235b739bbab5872377ba3e75

See more details on using hashes here.

File details

Details for the file dedupe-2.0.3-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.3-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 90.1 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for dedupe-2.0.3-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 a4f685c938fef95544fc5d9949873fa905045abf9910f980f9d9cac93e6923a8
MD5 376c91d8220fab7d80005031b50509d7
BLAKE2b-256 c86cc94f5083ee6104cddc8d0ace7ade299300f5afe771fc1d9335d7c9bfbc67

See more details on using hashes here.

File details

Details for the file dedupe-2.0.3-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.3-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 60.4 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for dedupe-2.0.3-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 8d125610e8bdf25bba340b55f749d297f90f425419e1b4ebe3374223b2449757
MD5 b153cc96207168ac5aca3d01fac6d0f8
BLAKE2b-256 43fc29f45b097fdd6e2c9dd2e86ccd3470aae0ab18ca1c56d2318f2a1587ed71

See more details on using hashes here.

File details

Details for the file dedupe-2.0.3-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.3-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 64.5 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.8

File hashes

Hashes for dedupe-2.0.3-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 671879b4355e32c9d05ab63471bf79f7cf9c1978d4857e09d887d1e127895a93
MD5 c45c78dc785c398eecc7268887faeacb
BLAKE2b-256 abdf5ad8a5058b60a2ca61b55a73d09a82fbc7432da9b709aeec9134b639d83e

See more details on using hashes here.

File details

Details for the file dedupe-2.0.3-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.3-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 89.0 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for dedupe-2.0.3-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 466f8cabe250c333fd65c51a6d0fca4413f721969d50a579e567746d7aac0de0
MD5 12af8dc92e787546d857417ec695da61
BLAKE2b-256 5e09179feb316147279c76ea7e6dc5a5f9e00a6feadaeda131d535247e580619

See more details on using hashes here.

File details

Details for the file dedupe-2.0.3-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.3-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 60.3 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.10

File hashes

Hashes for dedupe-2.0.3-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 b8e7e205754347e826c3a308be9db959858b850369269c404fa2ca5c06b77b33
MD5 a3ddb59ed572d7665ae9052b9840a2ef
BLAKE2b-256 9b474781aff2b5261512a01d738758fc29997e0a562e2cba4f04fd807527b55e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page