Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data. dedupe is the open source engine for dedupe.io

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Project details


Release history Release notifications | RSS feed

This version

2.0.6

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-2.0.6.tar.gz (67.6 kB view details)

Uploaded Source

Built Distributions

dedupe-2.0.6-cp38-cp38-win_amd64.whl (64.9 kB view details)

Uploaded CPython 3.8 Windows x86-64

dedupe-2.0.6-cp38-cp38-manylinux1_x86_64.whl (90.8 kB view details)

Uploaded CPython 3.8

dedupe-2.0.6-cp37-cp37m-win_amd64.whl (64.9 kB view details)

Uploaded CPython 3.7m Windows x86-64

dedupe-2.0.6-cp37-cp37m-manylinux1_x86_64.whl (90.4 kB view details)

Uploaded CPython 3.7m

dedupe-2.0.6-cp37-cp37m-macosx_10_14_x86_64.whl (60.6 kB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

dedupe-2.0.6-cp36-cp36m-win_amd64.whl (64.8 kB view details)

Uploaded CPython 3.6m Windows x86-64

dedupe-2.0.6-cp36-cp36m-manylinux1_x86_64.whl (89.3 kB view details)

Uploaded CPython 3.6m

dedupe-2.0.6-cp36-cp36m-macosx_10_14_x86_64.whl (60.6 kB view details)

Uploaded CPython 3.6m macOS 10.14+ x86-64

File details

Details for the file dedupe-2.0.6.tar.gz.

File metadata

  • Download URL: dedupe-2.0.6.tar.gz
  • Upload date:
  • Size: 67.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for dedupe-2.0.6.tar.gz
Algorithm Hash digest
SHA256 5693bd5a9f824b0012b54eb59446e6fde8014396148f2dc889ed27d151aa8107
MD5 9bfddfca2a549e77b8367c84605ebf33
BLAKE2b-256 f9430274df3748a7a751c45b69f16312fc94b434707ff4ff8fe2702b5f88119f

See more details on using hashes here.

File details

Details for the file dedupe-2.0.6-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.6-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 64.9 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for dedupe-2.0.6-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 97e60fb71049fa35701d1f89401e7c0c5ba4ae768e39b245fd7c418508e4a584
MD5 02727e8f3d67ebc87065969bdf019165
BLAKE2b-256 7f0b7240745382942fdd5a06a3327f8123b3fd2535d625ff219e6057b9e894d3

See more details on using hashes here.

File details

Details for the file dedupe-2.0.6-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.6-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 90.8 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for dedupe-2.0.6-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 54a74c2e65abafcf79ca5611d29361b8c10c54e25bf029b436f5de448c5497af
MD5 ada4a6f398e495fb01754856aaf34fdc
BLAKE2b-256 2fe6e4ef0ffe3b633fa649ad922255cc97a808bab3cb8f66a96cde25dbda14b6

See more details on using hashes here.

File details

Details for the file dedupe-2.0.6-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.6-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 64.9 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.9

File hashes

Hashes for dedupe-2.0.6-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 bb6f83bd46cb3262365314048aa2972a0d2fa772ee36bf08d0720784181bb57c
MD5 b3c3676c6a8151b08761b6dabe6b09bb
BLAKE2b-256 d42dca722228b3862dd386c07c92d312a262b2a8e6180986bd1a2417dde20885

See more details on using hashes here.

File details

Details for the file dedupe-2.0.6-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.6-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 90.4 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for dedupe-2.0.6-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 6145a6a8ab4d6db42936dccc23388d113451ab0314d36290a7ae24b2fd2053da
MD5 4f709a341c2513c5164744494dd5a49b
BLAKE2b-256 9956506d15e14935cb8d05d275dc8c54fae6f9e54e1d2448e2d24d06c18b0c7e

See more details on using hashes here.

File details

Details for the file dedupe-2.0.6-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.6-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 60.6 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.9

File hashes

Hashes for dedupe-2.0.6-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 fb3e9be7950c7fdfcf00956473451539ab91a807a4d3e15a1fd0798cf08cd0c5
MD5 09a43f3c5846e7f4c662df51c5d2401c
BLAKE2b-256 c86db24ce86e8c22b9f62369bf8174e55d784fbf78a7c533e9dfd515eca21365

See more details on using hashes here.

File details

Details for the file dedupe-2.0.6-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.6-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 64.8 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.8

File hashes

Hashes for dedupe-2.0.6-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 9f60c253391926882714279715947665e5d990f95be9bfc03754c9437a093d45
MD5 55363f98154efa33cb044e3f75ebab3f
BLAKE2b-256 fd8ee568a6c71d757b00ecec0e0c73d9b31d28df8b293047525cea2d135412c1

See more details on using hashes here.

File details

Details for the file dedupe-2.0.6-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.6-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 89.3 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for dedupe-2.0.6-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 048ef763302c43c5444a6faf7e0ae8d52631459983fe4231fd6c2d6af761f196
MD5 2da804c0f1b6ee4bfdb176936bc4576b
BLAKE2b-256 23df94b508dc18d463c0b2922c2a3b099022f5869776249e9ffb0810bd67d0da

See more details on using hashes here.

File details

Details for the file dedupe-2.0.6-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.6-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 60.6 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.12

File hashes

Hashes for dedupe-2.0.6-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 ea18ff16298126870e621f058ec7e121d87f3e1af71958c49b352e6ce89f4e7c
MD5 12e54ac8edafe87b57d0505b399378ad
BLAKE2b-256 83fb6584dc238a596985a6d8e3f2cebfac9fa78ff6e5787cba86ff8ff7ab383a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page