Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.5.3.tar.gz (47.6 kB view details)

Uploaded Source

Built Distributions

dedupe-1.5.3-cp35-cp35m-macosx_10_11_x86_64.whl (49.6 kB view details)

Uploaded CPython 3.5m macOS 10.11+ x86-64

dedupe-1.5.3-cp34-cp34m-win_amd64.whl (50.3 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.5.3-cp34-cp34m-win32.whl (49.6 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.5.3-cp27-cp27m-win_amd64.whl (50.4 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.5.3-cp27-cp27m-win32.whl (49.6 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.5.3-cp27-cp27m-macosx_10_11_x86_64.whl (49.2 kB view details)

Uploaded CPython 2.7m macOS 10.11+ x86-64

File details

Details for the file dedupe-1.5.3.tar.gz.

File metadata

  • Download URL: dedupe-1.5.3.tar.gz
  • Upload date:
  • Size: 47.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.5.3.tar.gz
Algorithm Hash digest
SHA256 23a60a7efba5878282c7dd6339d0086ee2c289f013fc7b21da78b4a0341c1708
MD5 b012fbade7912e446efd48e87a22383b
BLAKE2b-256 28ffb67393f9b82c6d3b13d1a045417534c1a3db6b3aefaeeeb75c26b462b3c2

See more details on using hashes here.

File details

Details for the file dedupe-1.5.3-cp35-cp35m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.3-cp35-cp35m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 ef40097cd68ff36840e44a0c93443d476f35e729139b85eb563524977bd59f2a
MD5 f0a31d9b7fe2c92a891058328e7a82b0
BLAKE2b-256 63487e5dff2aac83763b62a660f066bcebb86a3ca6308dfb041da5498d0393d9

See more details on using hashes here.

File details

Details for the file dedupe-1.5.3-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.3-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 dcc0d150e492ba997725c8b0e27c14f685d82b5ad8baf3d5135a9e650c8ca27f
MD5 49cf5be9a1cd800dcf13a19f0e6e5198
BLAKE2b-256 68a757b9cae660180792d8ccedeba3afaadae42b2276378fd2662e66fb344e7b

See more details on using hashes here.

File details

Details for the file dedupe-1.5.3-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.5.3-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 952219a70d0b31c6dfa8c596bb0a48493805b51c963f5e5c739a829a2304f85c
MD5 40b7717f080f268934ef01ecfe2534f1
BLAKE2b-256 5e5c991e081d6d1dc019c98d868beb3a46d09bc0e654155b6b14cf8d56845cc4

See more details on using hashes here.

File details

Details for the file dedupe-1.5.3-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.3-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 b5d0fe5ab19b1cdf23aa420757372934d9b1c6f37a13b25e784b92349dcabec9
MD5 f3c21c6f844701d5aaa5c345edacbd8c
BLAKE2b-256 68e7f7cde1ad62772445175bcb531d7e34e5d0bbc73dba35d54af5440848cb9a

See more details on using hashes here.

File details

Details for the file dedupe-1.5.3-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.5.3-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 6279496a6e4dd5e3b2fa8d2a81c1551d1dca13397a971f247699f91711c10686
MD5 333ad6e7d08d4258ceb4411c383e4ef6
BLAKE2b-256 5f0cef97d83b737aa4b1781a5241233a0be8392180740a55df3bd61f49b2559e

See more details on using hashes here.

File details

Details for the file dedupe-1.5.3-cp27-cp27m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.3-cp27-cp27m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 c9f8f4fee7357a858b796c07d30aa47f7ccc6f97369b70a74a0be80d138f78ed
MD5 0b91fd89597a08b0a51079ca4631e855
BLAKE2b-256 5fc85e01f5f03162085a14f84c5ab1eb4db3e6dee06d9fed6c17e71710c7e3dc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page