Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data. dedupe is the open source engine for dedupe.io

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-2.0.4.tar.gz (67.4 kB view details)

Uploaded Source

Built Distributions

dedupe-2.0.4-cp38-cp38-win_amd64.whl (64.7 kB view details)

Uploaded CPython 3.8 Windows x86-64

dedupe-2.0.4-cp38-cp38-manylinux1_x86_64.whl (90.6 kB view details)

Uploaded CPython 3.8

dedupe-2.0.4-cp37-cp37m-win_amd64.whl (64.7 kB view details)

Uploaded CPython 3.7m Windows x86-64

dedupe-2.0.4-cp37-cp37m-manylinux1_x86_64.whl (90.2 kB view details)

Uploaded CPython 3.7m

dedupe-2.0.4-cp37-cp37m-macosx_10_14_x86_64.whl (60.5 kB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

dedupe-2.0.4-cp36-cp36m-win_amd64.whl (64.6 kB view details)

Uploaded CPython 3.6m Windows x86-64

dedupe-2.0.4-cp36-cp36m-manylinux1_x86_64.whl (89.1 kB view details)

Uploaded CPython 3.6m

dedupe-2.0.4-cp36-cp36m-macosx_10_14_x86_64.whl (60.4 kB view details)

Uploaded CPython 3.6m macOS 10.14+ x86-64

File details

Details for the file dedupe-2.0.4.tar.gz.

File metadata

  • Download URL: dedupe-2.0.4.tar.gz
  • Upload date:
  • Size: 67.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for dedupe-2.0.4.tar.gz
Algorithm Hash digest
SHA256 08ece890eeea6f2cf5b8b1ee53891450aa20ce7aea9c65addf4fe9764c0c53e5
MD5 99ec000c179f363341f7e32e01041679
BLAKE2b-256 82a4dc0e36bc41b54f0e1ba7cc5b8e4e1a135253b94c4cded7f4411fbda529b8

See more details on using hashes here.

File details

Details for the file dedupe-2.0.4-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.4-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 64.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for dedupe-2.0.4-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 fd334121cc158dac70adbb59370d48ebda3a9731c1195b40ba818faec07c2b8c
MD5 0f9843e7b290662dda37b97a5940e06e
BLAKE2b-256 f509bfda7df7e9c6d446ccee35d78dabf1828b97acfc1574ed77be40d73960b5

See more details on using hashes here.

File details

Details for the file dedupe-2.0.4-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.4-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 90.6 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for dedupe-2.0.4-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 8f47a7c0e5c7a5b3705128377ec4b34e4be76607e80d3a58704ca3f9479df34d
MD5 8fea8f01f8d42b82b2e1b76ed0347d9a
BLAKE2b-256 cd90faa8d1c978ce06cba0419f98547e4055fdbf015c557ae7e635f2dcc530d8

See more details on using hashes here.

File details

Details for the file dedupe-2.0.4-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.4-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 64.7 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for dedupe-2.0.4-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 8bd52e209d76ea9a696410fac08591ed527e0fb4cb0a3f94aefaca617914b442
MD5 c7755acd34c058b5025eab72262adb25
BLAKE2b-256 c777761e1cbbe642a5eeb18037f665cee7f125c4a5ca081a0fada2c47059600a

See more details on using hashes here.

File details

Details for the file dedupe-2.0.4-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.4-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 90.2 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for dedupe-2.0.4-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 ac914ada711667385a5b7dcb4550ac503b454de9b84cdcc2fa9a61c42c449454
MD5 df6f4d9afd8444c1dbd8c0f3bef1b993
BLAKE2b-256 c5c1983b5c880df6d20d22854009a0bc98834b569965bf9bae4221d5e138457d

See more details on using hashes here.

File details

Details for the file dedupe-2.0.4-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.4-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 60.5 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for dedupe-2.0.4-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 f664349e16a0d47cf6f88d2c4161a64074a60ff31aa2310ba57bd6d3f310cd5a
MD5 073aaa77954e367a787a085d326f2f3a
BLAKE2b-256 4e0edd514627596c6277a2cb1b3ef380751d613eab4cfa8f7298b5076f0d12d1

See more details on using hashes here.

File details

Details for the file dedupe-2.0.4-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.4-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 64.6 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.8

File hashes

Hashes for dedupe-2.0.4-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 81c7c7cc2491ccdeb0316be31f1b26d6c3a267a189634feb321acc6888a79212
MD5 d798024943aa65c0f937fb7d0a2a027e
BLAKE2b-256 2877c3dcf0ecabe6a99ba6b4262064b78cde05ec54282ab81bfe681b788e64c1

See more details on using hashes here.

File details

Details for the file dedupe-2.0.4-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.4-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 89.1 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for dedupe-2.0.4-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 307e9e47f7ca89539f87e27c7d2778cd3dbf4d060ea98da86baffcb959cf718d
MD5 321636f788b8cbd087d07e70575cac31
BLAKE2b-256 17a643bd0eda4078d111167c2267360a6e90d31d2300b1052578e649ef095877

See more details on using hashes here.

File details

Details for the file dedupe-2.0.4-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.4-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 60.4 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.11

File hashes

Hashes for dedupe-2.0.4-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 1f87802421a69bcc3f42c788ab7b6eb277ffe4eb0b953882b49809a4eb5463b8
MD5 5a4419758794b27867053b967323e155
BLAKE2b-256 72238b03149d903a6cc50ed3f6084ce42b4a0567fbfd1d72f4a16a61b15dfac7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page