Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.5.1.tar.gz (47.5 kB view details)

Uploaded Source

Built Distributions

dedupe-1.5.1-cp34-cp34m-win_amd64.whl (50.2 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.5.1-cp34-cp34m-win32.whl (49.5 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.5.1-cp27-cp27m-win_amd64.whl (50.3 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.5.1-cp27-cp27m-win32.whl (49.5 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.5.1-cp27-cp27m-macosx_10_11_x86_64.whl (49.1 kB view details)

Uploaded CPython 2.7m macOS 10.11+ x86-64

File details

Details for the file dedupe-1.5.1.tar.gz.

File metadata

  • Download URL: dedupe-1.5.1.tar.gz
  • Upload date:
  • Size: 47.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.5.1.tar.gz
Algorithm Hash digest
SHA256 c458308fa993ba1df566999b351f05a3da59013af86eee27c47c4c24674921b4
MD5 5dd2512c3a9290b361d4f0edcfab0823
BLAKE2b-256 fb59d67d5e4cbb001fe610be090ce96f7d05da9092f4c386c9f99a3f45555f00

See more details on using hashes here.

File details

Details for the file dedupe-1.5.1-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.1-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 f4be0b6ebba06efd83078e5d2bcbf40c91937c932bb03566b00303e8e559e28d
MD5 2b161a94f779aa743a4b6fb1e5f6338b
BLAKE2b-256 99e858717d3f2319995ad198a51c9b5852f6ffadab24921e5de64d064c8b08af

See more details on using hashes here.

File details

Details for the file dedupe-1.5.1-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.5.1-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 a05e5b3ebefd98b33d8a0336d0092d64eba364cf3dc043cb72d358dfc0382e0c
MD5 4eec8fc60fedbf5809301a81c6b3626e
BLAKE2b-256 498e357164b49b29a2e9e35ea01ccfdecad96ea67432b2e6d4d511a6fb22bfaa

See more details on using hashes here.

File details

Details for the file dedupe-1.5.1-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.1-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 3431320ecc3345f60a1460594c9dae118cca23b40f2001da3bab3f0bf8ebb1db
MD5 0886bb09111a3a5595224095dae4a8d8
BLAKE2b-256 c99fbdc50e5db6c4114b1671cd3cba8b76841fe47b545b7a9199170ed06b3e49

See more details on using hashes here.

File details

Details for the file dedupe-1.5.1-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.5.1-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 d99929e9d8cac04f2ee6f0b84fa7aaf56e75fb0399af728986e8249a93e8c3d0
MD5 2134f9b5d6f5dbbaef1c73cfd9493907
BLAKE2b-256 bf406f218fbf26a76305c1147b6bf338e288d92ae07dec8f933645437c78fb73

See more details on using hashes here.

File details

Details for the file dedupe-1.5.1-cp27-cp27m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.1-cp27-cp27m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 2dd77b9bf204421fe3e15964a6367e26a11c690034d5e4a8a1b7d2d9514768ea
MD5 66f05c4f3cb71bf8716941b06e65f0fb
BLAKE2b-256 a2e8bd39be1a9bddebc1485a85a6587bc10c6c669b0dcc62e88261294c4a380f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page