Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data. dedupe is the open source engine for dedupe.io

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.9.3.tar.gz (58.6 kB view details)

Uploaded Source

Built Distributions

dedupe-1.9.3-cp36-cp36m-manylinux1_x86_64.whl (79.1 kB view details)

Uploaded CPython 3.6m

dedupe-1.9.3-cp36-cp36m-manylinux1_i686.whl (76.3 kB view details)

Uploaded CPython 3.6m

dedupe-1.9.3-cp36-cp36m-macosx_10_6_intel.whl (60.4 kB view details)

Uploaded CPython 3.6m macOS 10.6+ intel

dedupe-1.9.3-cp35-cp35m-manylinux1_x86_64.whl (78.9 kB view details)

Uploaded CPython 3.5m

dedupe-1.9.3-cp35-cp35m-manylinux1_i686.whl (76.1 kB view details)

Uploaded CPython 3.5m

dedupe-1.9.3-cp35-cp35m-macosx_10_6_intel.whl (60.4 kB view details)

Uploaded CPython 3.5m macOS 10.6+ intel

dedupe-1.9.3-cp34-cp34m-win_amd64.whl (54.5 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.9.3-cp34-cp34m-win32.whl (53.7 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.9.3-cp34-cp34m-manylinux1_x86_64.whl (79.0 kB view details)

Uploaded CPython 3.4m

dedupe-1.9.3-cp34-cp34m-manylinux1_i686.whl (76.2 kB view details)

Uploaded CPython 3.4m

dedupe-1.9.3-cp34-cp34m-macosx_10_6_intel.whl (60.3 kB view details)

Uploaded CPython 3.4m macOS 10.6+ intel

dedupe-1.9.3-cp27-cp27mu-manylinux1_x86_64.whl (77.0 kB view details)

Uploaded CPython 2.7mu

dedupe-1.9.3-cp27-cp27mu-manylinux1_i686.whl (74.1 kB view details)

Uploaded CPython 2.7mu

dedupe-1.9.3-cp27-cp27m-win_amd64.whl (54.4 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.9.3-cp27-cp27m-win32.whl (53.5 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.9.3-cp27-cp27m-manylinux1_x86_64.whl (76.9 kB view details)

Uploaded CPython 2.7m

dedupe-1.9.3-cp27-cp27m-manylinux1_i686.whl (74.1 kB view details)

Uploaded CPython 2.7m

dedupe-1.9.3-cp27-cp27m-macosx_10_6_intel.whl (59.6 kB view details)

Uploaded CPython 2.7m macOS 10.6+ intel

File details

Details for the file dedupe-1.9.3.tar.gz.

File metadata

  • Download URL: dedupe-1.9.3.tar.gz
  • Upload date:
  • Size: 58.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.5.6

File hashes

Hashes for dedupe-1.9.3.tar.gz
Algorithm Hash digest
SHA256 6bb53ae99236fb399866f071d049608b30be4159416684a9403e1bc0c3036319
MD5 3e8ebaaaeb2df5dc789709996e66ed39
BLAKE2b-256 e42e05f5f8b9769bb4b46c8500727f52f31be4807efd2b5192e1e5d53605857b

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 79.1 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.2.4 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.14

File hashes

Hashes for dedupe-1.9.3-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 3c061759e480d98b67b43bba013a6b4f895dbdf712ceaa54447c9f3ff432c6f1
MD5 0d746e927450a6ea944455d57f580678
BLAKE2b-256 08a337c711f68cb5744aa41ff102b83c0b2e471011cbb7154042c62b51a257ce

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp36-cp36m-manylinux1_i686.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp36-cp36m-manylinux1_i686.whl
  • Upload date:
  • Size: 76.3 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.2.4 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.14

File hashes

Hashes for dedupe-1.9.3-cp36-cp36m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 0a31922ce963b0953ce05d80dd02194a468552804eee339b818115f4fc6d6d20
MD5 7b54f838153939564d4750a28bdc3fc5
BLAKE2b-256 45bef88269d8ac6cbc23e7f7bbd4659c35007bb9b5dcb85ec48522151b300e4b

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp36-cp36m-macosx_10_6_intel.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp36-cp36m-macosx_10_6_intel.whl
  • Upload date:
  • Size: 60.4 kB
  • Tags: CPython 3.6m, macOS 10.6+ intel
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.1

File hashes

Hashes for dedupe-1.9.3-cp36-cp36m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 8d8d458bb5e3b855dd1014a087fa3a858defdd0c34ba83d102bf341b7b63b586
MD5 08fc578dc82887b0247d56328c14f322
BLAKE2b-256 90c302751d2e6827f4e98762f1f34d6e56b8afc27c3193b4e3710f87fd6e7758

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 78.9 kB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.2.4 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.14

File hashes

Hashes for dedupe-1.9.3-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 3d9497d6b65fa6b945517f19972e0be98bfbc3333562d9b63b2f08c2211a2b58
MD5 a4bef26266bb767405a8d4c2fdd10bc6
BLAKE2b-256 8ae95d5a5bbfc8ed6d33f6d13054cb8e6bee6ec4b05fa3d1008a7983c02e07b2

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp35-cp35m-manylinux1_i686.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp35-cp35m-manylinux1_i686.whl
  • Upload date:
  • Size: 76.1 kB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.2.4 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.14

File hashes

Hashes for dedupe-1.9.3-cp35-cp35m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 d2e2178c1c85247bf8e8fe25133b25e9f2ee263d2e034bbc123a731f8fa7a1f6
MD5 b070cacccd9a68187cc525e240ab55a8
BLAKE2b-256 7da678ec2beb21de5ffa5880596f902b6e50f2fa33b2bc7f415b947f8b3d69ae

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp35-cp35m-macosx_10_6_intel.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp35-cp35m-macosx_10_6_intel.whl
  • Upload date:
  • Size: 60.4 kB
  • Tags: CPython 3.5m, macOS 10.6+ intel
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.5.4

File hashes

Hashes for dedupe-1.9.3-cp35-cp35m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 0765b8120ce0c3264804c6014de3007a25ea6617b6c1f20cc7eb9d1df08e6459
MD5 7faadb141535687db6eb7d830571b701
BLAKE2b-256 9463ddd1554f2689a2cfc655c4ae2042ba3c49ce9fd2d0b38434449afd59468d

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp34-cp34m-win_amd64.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp34-cp34m-win_amd64.whl
  • Upload date:
  • Size: 54.5 kB
  • Tags: CPython 3.4m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/18.2 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.4.4

File hashes

Hashes for dedupe-1.9.3-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 fb415b584f53c3593940edab219f555e64f18a4232e14b7d1abae1c80326d7bb
MD5 a93238773ce06b8a30de3da35cf2407f
BLAKE2b-256 60f789ed7984cd0cef9526dfea60f95ee878299fa0d15ad85054cd325921973d

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp34-cp34m-win32.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp34-cp34m-win32.whl
  • Upload date:
  • Size: 53.7 kB
  • Tags: CPython 3.4m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/18.2 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.4.4

File hashes

Hashes for dedupe-1.9.3-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 fe8ddf1f3b46604f6f99312a75506e4435783a4747e7ae8155811982cddae806
MD5 4e3918f2860ec655a5e8210d855aa810
BLAKE2b-256 08175e2ffe1b51fb693fcbc1f9857a690fe14816024488ce7dfbf5cd56cb68d4

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp34-cp34m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 79.0 kB
  • Tags: CPython 3.4m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.2.4 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.14

File hashes

Hashes for dedupe-1.9.3-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2458fd378b1f63e087625985e09be4e7c46bbbcc7bf8949e13bd07540a75e64c
MD5 e2886f26d175f6d6f1a3ac1e136d4834
BLAKE2b-256 3bb7c1c73e777313eea6f8b5530487bcd85002969c3d36575f250feaa6aa1bde

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp34-cp34m-manylinux1_i686.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp34-cp34m-manylinux1_i686.whl
  • Upload date:
  • Size: 76.2 kB
  • Tags: CPython 3.4m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.2.4 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.14

File hashes

Hashes for dedupe-1.9.3-cp34-cp34m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 46fb7b006912fd9d3a1549f52c263079b43c4e9757de24c87c40749adf280dc5
MD5 7a095995d7cc7a7b72c353504461e369
BLAKE2b-256 227fd2d1bc2be2142a8bdc39d1f4525c099ff76026f2d51d65ea8cc9370b1bfd

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp34-cp34m-macosx_10_6_intel.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp34-cp34m-macosx_10_6_intel.whl
  • Upload date:
  • Size: 60.3 kB
  • Tags: CPython 3.4m, macOS 10.6+ intel
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.4.4

File hashes

Hashes for dedupe-1.9.3-cp34-cp34m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 603a678431738a473dff02384f835825dcff6217747512569702e2924f9c4da7
MD5 301859fc617ee4003270d05ac168a9d2
BLAKE2b-256 85debf82b18248ea93c5af260ed9698ec6c1b6866100941e28cd675d8b03dc60

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp27-cp27mu-manylinux1_x86_64.whl
  • Upload date:
  • Size: 77.0 kB
  • Tags: CPython 2.7mu
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.2.4 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.14

File hashes

Hashes for dedupe-1.9.3-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 5a7e986139de96a7f86a0ecee5bbecc02c4b72889f5d7758212f837d30651636
MD5 0fd5308be243c749f7f536b35737c5c3
BLAKE2b-256 6069cbf5a6f80b6f3a59aad1ea3c07f59d3e53d571f5513bc35fd8a9e5032128

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp27-cp27mu-manylinux1_i686.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp27-cp27mu-manylinux1_i686.whl
  • Upload date:
  • Size: 74.1 kB
  • Tags: CPython 2.7mu
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.2.4 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.14

File hashes

Hashes for dedupe-1.9.3-cp27-cp27mu-manylinux1_i686.whl
Algorithm Hash digest
SHA256 b53b90e32dc9b86aa62839622578a320353df07eabea1b7c6cd6b8a200dbbc5b
MD5 50059ccd03be24df546d93ec63afba67
BLAKE2b-256 66bcf98d915cd23a91fd39b9f92c6a23b2f8279b00df987299085ae1068fbff9

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp27-cp27m-win_amd64.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp27-cp27m-win_amd64.whl
  • Upload date:
  • Size: 54.4 kB
  • Tags: CPython 2.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.15

File hashes

Hashes for dedupe-1.9.3-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 bb20992d7847f875ae8e23ccc2721dfe08b2e666921be1d38dec970312ebef2b
MD5 fe37d74d38ef51d91b81e0c9a2855051
BLAKE2b-256 eb56dbd68ca7469e5518ae013b2217c4567a6d168e6f35b88099236b678be0ea

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp27-cp27m-win32.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp27-cp27m-win32.whl
  • Upload date:
  • Size: 53.5 kB
  • Tags: CPython 2.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.15

File hashes

Hashes for dedupe-1.9.3-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 77491dd20bb63a2d62e5c67f40a4e8ce2f021a9a9c9f8589f7baafc0d259d80e
MD5 213f3d2706ed88d3a3c9ae2484d2f0a3
BLAKE2b-256 c502b02f14c93c85b7fbab85caa1015d7096ae7f2e24cdd431883977456c6a4b

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp27-cp27m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 76.9 kB
  • Tags: CPython 2.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.2.4 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.14

File hashes

Hashes for dedupe-1.9.3-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 4aff86854d9c7e174cad5c737267737da5dd7860be8079dfe99e98c71dd530a4
MD5 5ca2b43f45dd5e5cf6fcb5b0cb46be58
BLAKE2b-256 9520881d957fbf1eee68b74e12a76477b1dee2fdaf27c415d668cdc7c5f82df9

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp27-cp27m-manylinux1_i686.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp27-cp27m-manylinux1_i686.whl
  • Upload date:
  • Size: 74.1 kB
  • Tags: CPython 2.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.2.4 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.14

File hashes

Hashes for dedupe-1.9.3-cp27-cp27m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 8b0808931f94ca8d0b4fde87941b7382741a96a06e5b81ca0cd79eea5909ec43
MD5 a26ced0847c8e00f3912504f5a23cdf0
BLAKE2b-256 0c5aa86906447889787d188106e156558e38ee0c067bfd065a457356fc474904

See more details on using hashes here.

File details

Details for the file dedupe-1.9.3-cp27-cp27m-macosx_10_6_intel.whl.

File metadata

  • Download URL: dedupe-1.9.3-cp27-cp27m-macosx_10_6_intel.whl
  • Upload date:
  • Size: 59.6 kB
  • Tags: CPython 2.7m, macOS 10.6+ intel
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.14

File hashes

Hashes for dedupe-1.9.3-cp27-cp27m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 786b15d689d027d3f3bbdba226a7de8bf3472c2a255b6c528bba15eb1dfcabb7
MD5 459ebaba2e1650f5bfd1ecf8d02907ec
BLAKE2b-256 cf9ddc11509c62dc3bce10767bbd9d9dfd19c98b028a92067e1610fef284eeca

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page