Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.4.15.tar.gz (48.3 kB view details)

Uploaded Source

Built Distributions

dedupe-1.4.15-cp35-cp35m-macosx_10_9_x86_64.whl (50.7 kB view details)

Uploaded CPython 3.5m macOS 10.9+ x86-64

dedupe-1.4.15-cp34-cp34m-win_amd64.whl (51.4 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.4.15-cp34-cp34m-win32.whl (50.7 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.4.15-cp27-cp27m-win_amd64.whl (51.5 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.4.15-cp27-cp27m-win32.whl (50.7 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.4.15-cp27-cp27m-macosx_10_9_x86_64.whl (50.4 kB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file dedupe-1.4.15.tar.gz.

File metadata

  • Download URL: dedupe-1.4.15.tar.gz
  • Upload date:
  • Size: 48.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.4.15.tar.gz
Algorithm Hash digest
SHA256 83da799de9cca7e9b29f66d516b023c282b649110c0c8b5fc01f04ce2b003619
MD5 c80eb4ad69d5ca8d36fa2bf484eeb82b
BLAKE2b-256 b62f3860fc37bdfa9922ef349af873a8a1a8cab5475af54465b230ca058ef0e9

See more details on using hashes here.

File details

Details for the file dedupe-1.4.15-cp35-cp35m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.15-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e6a18bfdc7e4c9c311a5a854d71f827c271e6ffcdd1e46fe0fbed20b44e13db0
MD5 40cce0906c90bcf2ac3e4a752b7f86e7
BLAKE2b-256 a5744835561df22ce8cf0d90d994e4113b3da46034c2ad50e716b876ec2c5aa2

See more details on using hashes here.

File details

Details for the file dedupe-1.4.15-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.15-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 12f548979bae3adb5d3624bfcf247d7980ecf1fa1863e8c16934aae106fffd82
MD5 c0f64145a7a93b5ca862067353cdac12
BLAKE2b-256 1b276b8e3494b9f78be1716b0800b870538ecc67395a005335f5a4a4e68cc2d5

See more details on using hashes here.

File details

Details for the file dedupe-1.4.15-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.15-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 ed71632cbf32411188e078402272301df8994e8615ab00c687fb61c9b8658efa
MD5 11487c21dc37ba07144d98856ef7af66
BLAKE2b-256 09d6e2c36e28d265e7b7adadf5d245ec9172115a9466bb63bfef59dfd935c8cf

See more details on using hashes here.

File details

Details for the file dedupe-1.4.15-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.15-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 105b8dd17979861142a3cb8b77222c87264ae4702f016ac3b03ba6528becce5e
MD5 76abdc89a56053dfbe7300bc7b48e9dd
BLAKE2b-256 40d312891a236ab034f39d468364c5183a66eba5c02510ea691d3ac986b4644d

See more details on using hashes here.

File details

Details for the file dedupe-1.4.15-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.15-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 5bc1ba296952a8f88372178f88d600a9373c9e4f515af3ae6abd99731470b8b7
MD5 9f0a49ee0b9a820ba5d550f78edf02ec
BLAKE2b-256 a7cb73620e481f87e72c68f87526000e03da4b0b16ef1526510ade8702ccdbd6

See more details on using hashes here.

File details

Details for the file dedupe-1.4.15-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.15-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 2ff21c53001038e7bade6cab862d7daf441a0a081ada8a50692a69a05d309a02
MD5 fcf78d94292b67b42368eb1fae892d52
BLAKE2b-256 5a0b22f0bff3a9f235dfaeeabc0f4b1c0bb9b18db0c07fc8e7d2c3adb88b2a76

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page