Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data. dedupe is the open source engine for dedupe.io

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Project details


Release history Release notifications | RSS feed

This version

2.0.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-2.0.0.tar.gz (66.2 kB view details)

Uploaded Source

Built Distributions

dedupe-2.0.0-cp38-cp38-win_amd64.whl (64.5 kB view details)

Uploaded CPython 3.8 Windows x86-64

dedupe-2.0.0-cp38-cp38-manylinux1_x86_64.whl (89.9 kB view details)

Uploaded CPython 3.8

dedupe-2.0.0-cp38-cp38-macosx_10_13_x86_64.whl (60.4 kB view details)

Uploaded CPython 3.8 macOS 10.13+ x86-64

dedupe-2.0.0-cp37-cp37m-win_amd64.whl (64.5 kB view details)

Uploaded CPython 3.7m Windows x86-64

dedupe-2.0.0-cp37-cp37m-manylinux1_x86_64.whl (89.7 kB view details)

Uploaded CPython 3.7m

dedupe-2.0.0-cp37-cp37m-macosx_10_13_x86_64.whl (60.4 kB view details)

Uploaded CPython 3.7m macOS 10.13+ x86-64

dedupe-2.0.0-cp36-cp36m-win_amd64.whl (64.4 kB view details)

Uploaded CPython 3.6m Windows x86-64

dedupe-2.0.0-cp36-cp36m-manylinux1_x86_64.whl (88.6 kB view details)

Uploaded CPython 3.6m

dedupe-2.0.0-cp36-cp36m-macosx_10_13_x86_64.whl (60.4 kB view details)

Uploaded CPython 3.6m macOS 10.13+ x86-64

File details

Details for the file dedupe-2.0.0.tar.gz.

File metadata

  • Download URL: dedupe-2.0.0.tar.gz
  • Upload date:
  • Size: 66.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2

File hashes

Hashes for dedupe-2.0.0.tar.gz
Algorithm Hash digest
SHA256 626f6a504b21f63a997a808c22250629a0caf1a8703a52f8ba36da9ebfaea7ce
MD5 e1d32cdb5291b5c579ad31e2bea9c13f
BLAKE2b-256 c4bd54a3056307bf89b41d0a8543db669c709132119559698e5bbdb52949a7b9

See more details on using hashes here.

File details

Details for the file dedupe-2.0.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 64.5 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2

File hashes

Hashes for dedupe-2.0.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 6ef9dd9be1e501710e4ee3439f57aa34bff953bb4f1defbaeaa4710084842ad9
MD5 39e2e6dd2f0f6ce1ef98e54b93b31bd2
BLAKE2b-256 17504d35d435814c6fd6e427824c48ea7401188353d03d6ca30b1d75a9db936a

See more details on using hashes here.

File details

Details for the file dedupe-2.0.0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.0-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 89.9 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2

File hashes

Hashes for dedupe-2.0.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 0e4c61adcb958d39666218a560a6231344974dd175df01bc40530e6975364a27
MD5 eb1a3d75192e33f71f41b146415b0482
BLAKE2b-256 117ec3925bc707f874d732268ef8b7f2209a762482bfb3c5646cf533b06139c3

See more details on using hashes here.

File details

Details for the file dedupe-2.0.0-cp38-cp38-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.0-cp38-cp38-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 60.4 kB
  • Tags: CPython 3.8, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2

File hashes

Hashes for dedupe-2.0.0-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 8bdf4d638087ec3647716376d6a04771d1984ee3117a76cc4138e14dc36b9507
MD5 4614fb24dae278cf5fd8f27bf5e18822
BLAKE2b-256 e3b106a1ddd3707ed05e4d778465feb37b58d849d0ff10390ddc2920acbe218f

See more details on using hashes here.

File details

Details for the file dedupe-2.0.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 64.5 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for dedupe-2.0.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 8cae4aaedb6122ae8efbc4edf74a6c756c822e467821fb1060398e88c7685559
MD5 dd51d32cdf289e2e28612e37652c4523
BLAKE2b-256 672c57de424709b157d3ef4ec6f1fd61a84c37f2c7710338879e24ab8d8677b9

See more details on using hashes here.

File details

Details for the file dedupe-2.0.0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.0-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 89.7 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2

File hashes

Hashes for dedupe-2.0.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 27873c0d5b61a2c31f51143433dcbf24ba4b8cd040f1cf5699ce07d7daed0b80
MD5 66904ceda5eed14f870784bfa259a821
BLAKE2b-256 c8685474061c62e0f6b1a7cd3b66e50b63b524b6487c6ff717a053f7715bfe4b

See more details on using hashes here.

File details

Details for the file dedupe-2.0.0-cp37-cp37m-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.0-cp37-cp37m-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 60.4 kB
  • Tags: CPython 3.7m, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for dedupe-2.0.0-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 834162c9c63fff0cad3eeae4c26750643d9b21f9a4200debc4cdeab9b898013e
MD5 48b7123246d6af0cc4fd068c2d0d78db
BLAKE2b-256 7fb199fc0aa305c8883f442e7bd7f4cf171f4a96bb9ad8fd6038a5a7084f5c7e

See more details on using hashes here.

File details

Details for the file dedupe-2.0.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 64.4 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.8

File hashes

Hashes for dedupe-2.0.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 1950dd7edb8ee21a9bfb48d2199c9e338d9b679b46b23a77db61685ac78fbe41
MD5 0b7e4804ef798b0423deb2364e6a8d13
BLAKE2b-256 cdf473ab913d06228467ca654d77d3e18019aa987637110ddbdb3f82153e8695

See more details on using hashes here.

File details

Details for the file dedupe-2.0.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.0-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 88.6 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2

File hashes

Hashes for dedupe-2.0.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 ce80663663d9e7b3eb153d94d7d3823450842c4e802a1ae32dc49c985ca9ee09
MD5 2df7efabfe9b071b48861ca68370fa2a
BLAKE2b-256 0da8a8b1d27efbca1f53f011929a4dc20fe1d563718c0329e0e35632214572e6

See more details on using hashes here.

File details

Details for the file dedupe-2.0.0-cp36-cp36m-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.0-cp36-cp36m-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 60.4 kB
  • Tags: CPython 3.6m, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.10

File hashes

Hashes for dedupe-2.0.0-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 f1e89e92b92f9093c94bade724da39f8c116ceef12b348f0c4ad247c97435e6a
MD5 9bd73bd12f2fba7ac40ab491c05c3cd2
BLAKE2b-256 5de5a379b49a3df6385c62245565183b547b0f6035c8e4e5e8ebd9dceb4ea004

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page