Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data. dedupe is the open source engine for dedupe.io

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-2.0.2.tar.gz (87.8 kB view details)

Uploaded Source

Built Distributions

dedupe-2.0.2-cp38-cp38-win_amd64.whl (64.7 kB view details)

Uploaded CPython 3.8 Windows x86-64

dedupe-2.0.2-cp38-cp38-manylinux1_x86_64.whl (90.2 kB view details)

Uploaded CPython 3.8

dedupe-2.0.2-cp37-cp37m-win_amd64.whl (64.6 kB view details)

Uploaded CPython 3.7m Windows x86-64

dedupe-2.0.2-cp37-cp37m-manylinux1_x86_64.whl (90.0 kB view details)

Uploaded CPython 3.7m

dedupe-2.0.2-cp37-cp37m-macosx_10_13_x86_64.whl (60.7 kB view details)

Uploaded CPython 3.7m macOS 10.13+ x86-64

dedupe-2.0.2-cp36-cp36m-win_amd64.whl (64.6 kB view details)

Uploaded CPython 3.6m Windows x86-64

dedupe-2.0.2-cp36-cp36m-manylinux1_x86_64.whl (88.9 kB view details)

Uploaded CPython 3.6m

dedupe-2.0.2-cp36-cp36m-macosx_10_13_x86_64.whl (60.7 kB view details)

Uploaded CPython 3.6m macOS 10.13+ x86-64

File details

Details for the file dedupe-2.0.2.tar.gz.

File metadata

  • Download URL: dedupe-2.0.2.tar.gz
  • Upload date:
  • Size: 87.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.7

File hashes

Hashes for dedupe-2.0.2.tar.gz
Algorithm Hash digest
SHA256 e8dd267c21dfa15906f7c178e6fe6930b9dc87e02fd38b306a7757a7d5ea0dc5
MD5 59369266e06ea70471dea8c21c3f14b9
BLAKE2b-256 b61df79ba57b35e4dc2f3cfff7a45bda8dfe3db38e90f1961dbecdf7b16a2e5b

See more details on using hashes here.

File details

Details for the file dedupe-2.0.2-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.2-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 64.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for dedupe-2.0.2-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 20fd99d4c309bfda7ef762ac9d3ca69cf3b9f2e8f374da683689657c0a89f67e
MD5 41aed6606057b86a939ad23fe45f4236
BLAKE2b-256 bb2035208f496513245d75a65f94adb4f44bf52b97eb1025009c92b30b97909b

See more details on using hashes here.

File details

Details for the file dedupe-2.0.2-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.2-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 90.2 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for dedupe-2.0.2-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d7c4e312663b4bf7873e4cb231a10e1d6fc9e54da84c457adf92ce27a357604e
MD5 5a1ee98f41826d425ee22b5d1aea8c4d
BLAKE2b-256 3ba2f6d8f44b1ac560e02d2b7b712d0b1ef393590f75bf38d39cd98c93018cfa

See more details on using hashes here.

File details

Details for the file dedupe-2.0.2-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.2-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 64.6 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.6

File hashes

Hashes for dedupe-2.0.2-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 8e772d1c6fad46ff1bdb957d71cf79036c4fe5f64c327be5c9570de96c0bf1a3
MD5 7f49565b854577b5b48929fb0c919f65
BLAKE2b-256 00a923948ba69dbc09f5eb3557827c07246bd164604f269175f6e48de4facc93

See more details on using hashes here.

File details

Details for the file dedupe-2.0.2-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.2-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 90.0 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for dedupe-2.0.2-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 897717e50b009892c06e7fe5cb5135e2d739aab758baea52e5bf39b66f7a3add
MD5 ef85e31335dbd7291414b9e043d21f3b
BLAKE2b-256 7706f7ea33bf58b627da97d20c580129e17cb802a0a4f5eb4a7eff8a191d214b

See more details on using hashes here.

File details

Details for the file dedupe-2.0.2-cp37-cp37m-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.2-cp37-cp37m-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 60.7 kB
  • Tags: CPython 3.7m, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.6

File hashes

Hashes for dedupe-2.0.2-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 0cf2a6c45b8b665e5872be187c35a4a33a12cee7a9393b616130b5dfabbaa707
MD5 8659a6151835646c17c60880df7221b4
BLAKE2b-256 d6341ac46755f54a9907323c2fed961e54f4764bec694444bc7978ad80cd6cfe

See more details on using hashes here.

File details

Details for the file dedupe-2.0.2-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.2-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 64.6 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.8

File hashes

Hashes for dedupe-2.0.2-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 1f429a0ac22dee342f993b06561fa8c1b97167c1b2a12904c8b7c895b2a5b1f2
MD5 00b301194dde77ccb0db431771803334
BLAKE2b-256 37ec54be47036036cfe8192cd2c7f3356a9c696ef695519ca28b550f41a521f9

See more details on using hashes here.

File details

Details for the file dedupe-2.0.2-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.2-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 88.9 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for dedupe-2.0.2-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 468f598de11c455274bd2ed150f7f860375e61ba4158e85ce19a5d1f09c2b9b4
MD5 ce0ce4da14cf57b099bf7e08411f352a
BLAKE2b-256 56aff292ae5ab1589883b8bddb510ba8322d0a6d4343c95e73f36f2ed0702932

See more details on using hashes here.

File details

Details for the file dedupe-2.0.2-cp36-cp36m-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.2-cp36-cp36m-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 60.7 kB
  • Tags: CPython 3.6m, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.10

File hashes

Hashes for dedupe-2.0.2-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 13d301aefb733dd7ae168ad9577a3f56c679c0e83abc09d893ca195c6b651704
MD5 648838646007332aaec2606c0c4351d2
BLAKE2b-256 cfb00b68986ff3afe370f3ba4c3eb044c6acf4caa25db5b330016fcff212d8a0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page