Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.4.7.tar.gz (47.4 kB view details)

Uploaded Source

Built Distributions

dedupe-1.4.7-cp35-cp35m-macosx_10_9_x86_64.whl (50.2 kB view details)

Uploaded CPython 3.5m macOS 10.9+ x86-64

dedupe-1.4.7-cp34-cp34m-win_amd64.whl (51.0 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.4.7-cp27-cp27m-win_amd64.whl (51.1 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.4.7-cp27-cp27m-win32.whl (50.2 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.4.7-cp27-cp27m-macosx_10_9_x86_64.whl (49.9 kB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file dedupe-1.4.7.tar.gz.

File metadata

  • Download URL: dedupe-1.4.7.tar.gz
  • Upload date:
  • Size: 47.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.4.7.tar.gz
Algorithm Hash digest
SHA256 457da5e4fb318fd6feeab3529179814dcbe428afbb427beedbf2650e4ac2efaf
MD5 e14bc0a143024e82535782267fd0b9e3
BLAKE2b-256 36bc857e12d2762c0d8113992bf6de7458fec491843c62fa99c656128900ef2f

See more details on using hashes here.

File details

Details for the file dedupe-1.4.7-cp35-cp35m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.7-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 7e212b51b6d60678a05476e8860148bb34022edb681b3e871b038b3eca628307
MD5 42b35967cff3ac87db70187cd0e36329
BLAKE2b-256 e7a88e1bfcb095da806fbfa84fb3ede84f0a82187ffcaf418999b2867a96882c

See more details on using hashes here.

File details

Details for the file dedupe-1.4.7-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.7-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 9b3c696d8fc003733beb8c9822bd24f259a6edbdf19bcd4a6cba45eb5f6e409f
MD5 17176c55963dbab7e330ab735ddba27a
BLAKE2b-256 4a6b0a20e16e145b2062963e6abf8217d0ea2f71d5de1853a2ebd3adc677b119

See more details on using hashes here.

File details

Details for the file dedupe-1.4.7-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.7-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 796ae3be584d1689a99d9774f5f9fec4cb6d48d546b1b6440eb40f7989618f86
MD5 45dbd4c7b871af3b8a9cca19a153f7b7
BLAKE2b-256 1625410f742be53481efc76ca2fdc8daf0d64eb65f3bc773505b474a5122d30c

See more details on using hashes here.

File details

Details for the file dedupe-1.4.7-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.7-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 72d92a1ae0e9c41f6e4e6d42407e3dc7936729a851481a3e1f7483fa2f0b7522
MD5 a19285e2d16b3b91da179d63529090e8
BLAKE2b-256 f122355ce1468fbb7c376c02bcbf8bf6b3987fee43dd22ce1aee6de75201a4d7

See more details on using hashes here.

File details

Details for the file dedupe-1.4.7-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.7-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e129208bb45a03dd63dffcfa1958dbb3417b248b586733fb83d489029418ff3c
MD5 57517742ab5608ebb93bf8dbc5ba6c8a
BLAKE2b-256 9393ef5682c9a42e6757d7d7abe0767dc27a9a2ae8b80bb1a961884271fabd1e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page