Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.4.12.tar.gz (47.6 kB view details)

Uploaded Source

Built Distributions

dedupe-1.4.12-cp35-cp35m-macosx_10_9_x86_64.whl (49.9 kB view details)

Uploaded CPython 3.5m macOS 10.9+ x86-64

dedupe-1.4.12-cp34-cp34m-win_amd64.whl (50.7 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.4.12-cp34-cp34m-win32.whl (50.0 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.4.12-cp27-cp27m-win_amd64.whl (50.8 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.4.12-cp27-cp27m-win32.whl (49.9 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.4.12-cp27-cp27m-macosx_10_9_x86_64.whl (49.6 kB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file dedupe-1.4.12.tar.gz.

File metadata

  • Download URL: dedupe-1.4.12.tar.gz
  • Upload date:
  • Size: 47.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.4.12.tar.gz
Algorithm Hash digest
SHA256 3feace75287394faf530996ed9fa0d7c16a342eb00f0b3d111018382e5011eb3
MD5 4ae754e6daad005899d2f08bcb861392
BLAKE2b-256 f244afc911ea93717d8719335be2cbf1af59d25cd065a99a065d2d55895d467d

See more details on using hashes here.

File details

Details for the file dedupe-1.4.12-cp35-cp35m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.12-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 a118168728c48527a05259dac3080bc470ef1cadcbd14071664a9db055d06870
MD5 3836dd6766f2ff80894360197772dc38
BLAKE2b-256 25a15bf6599328c9b22a9744afb59f099d24476516f99b5496b56dc6d585e2d4

See more details on using hashes here.

File details

Details for the file dedupe-1.4.12-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.12-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 0c5cc05f418bd065f0c4ea8d450589ff10139e22613fbd4f4165225f9205d24f
MD5 1562c521ac16c73696d604345b10a1e2
BLAKE2b-256 c0b23619a0ac2847357748f160bff9ef26f681dc15b326a4c5eceeb9b879d807

See more details on using hashes here.

File details

Details for the file dedupe-1.4.12-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.12-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 0f0ae1efbc585ef4947811a2d9804719c84d0635df2d65bef8e431326fac425f
MD5 573700fe89fb5edd0bf1e039050d3c5d
BLAKE2b-256 97e1bbc01fdcc4cef57ad9441b129bf7af8b65a6828a7ad44313e5a429566d11

See more details on using hashes here.

File details

Details for the file dedupe-1.4.12-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.12-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 590bce40884413ef128f8e9c189265eb9761e4daa5399357a38b767f3ac1c700
MD5 250d3600c6ca74ca25b73472e4f60b35
BLAKE2b-256 c44e71c361f14bc7e5ed57eaa321087f91b0dbe7d01e39817263d6a0e8f0a8ba

See more details on using hashes here.

File details

Details for the file dedupe-1.4.12-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.12-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 d5ea8f31e0f711b0f4f535c22ae0a5cbd2f97828838ece69a08b454ed87255de
MD5 54bccd6e6f7568182deec7f6243befcb
BLAKE2b-256 64c9f23fdec9d2fbc5b45e69b64d2490b1be3d5d3c8b5827f3bc01d8bbf6c67f

See more details on using hashes here.

File details

Details for the file dedupe-1.4.12-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.12-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 187fcc118a67b2856faf3a15c71092da3a8940538315458fbd0a38980233e381
MD5 5024a41dc9ecb4a8df618f791c305e06
BLAKE2b-256 b980f15bff4f64a9796911fbdb76c4a2818b40756a27878bc832b905152470db

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page