Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.3.8.tar.gz (46.6 kB view details)

Uploaded Source

Built Distributions

dedupe-1.3.8-cp35-cp35m-macosx_10_9_x86_64.whl (48.3 kB view details)

Uploaded CPython 3.5m macOS 10.9+ x86-64

dedupe-1.3.8-cp34-cp34m-win_amd64.whl (49.4 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.3.8-cp34-cp34m-win32.whl (48.8 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.3.8-cp27-cp27m-win_amd64.whl (49.5 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.3.8-cp27-cp27m-win32.whl (48.9 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.3.8-cp27-cp27m-macosx_10_9_x86_64.whl (47.9 kB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file dedupe-1.3.8.tar.gz.

File metadata

  • Download URL: dedupe-1.3.8.tar.gz
  • Upload date:
  • Size: 46.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.3.8.tar.gz
Algorithm Hash digest
SHA256 b6bc042cd232b3896d87a3f329bdf79c000473f34521e25115bbb33650c97d23
MD5 11d5a0a7b23745a40fc346092babd82f
BLAKE2b-256 d8534eadac04d85b5693182fd49913b25b9f9d389310ba9d5c572612db11047c

See more details on using hashes here.

File details

Details for the file dedupe-1.3.8-cp35-cp35m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.3.8-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e0a8e6dcef7c99f57afb98340e1f9a176f4e09fa090b16c3c0f4b1b2b2865e7a
MD5 19b95320b66cf83e4f9c8b11266baf4c
BLAKE2b-256 9015ba85e8e888b94a4207fd812ac9cb9ba76eec890238bcbbb47bd9bec422f9

See more details on using hashes here.

File details

Details for the file dedupe-1.3.8-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.3.8-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 0bc7543d7af982997cd01b36d740fc755eeaee0c8144e23a04ea7fcbd88fd8c2
MD5 bc19074d8005a920d96c02e1a07cdcbd
BLAKE2b-256 122d216f7245b1d5a8e8f31a94158eaa9a4f5c888bc5a3deffbba9afcc9412ef

See more details on using hashes here.

File details

Details for the file dedupe-1.3.8-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.3.8-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 b641c56494f71ef1621e18892d64da3727ecbb9fd71693e49ac1f33a9066643c
MD5 d24ab4159991f7e73afa59715355c2f1
BLAKE2b-256 11b87fb2e9d90173dbcfc6598c883fd553df4fd800a0d19cfb02e86c6bf50849

See more details on using hashes here.

File details

Details for the file dedupe-1.3.8-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.3.8-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 7f28ed364af28be6f88315b76905043e478a6ef608cb21866be7e9b33f48514d
MD5 5a26ebb77cb50b4757e0337fe8a3fdb3
BLAKE2b-256 abacc28f44cfd96e452f2c6fa31c5ab15545cae7a4ce8a1a70129bfa0b7544e7

See more details on using hashes here.

File details

Details for the file dedupe-1.3.8-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.3.8-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 a3858269cc14d667686932024ac9cf7b6386f3f2ad62c0df40c0439fe243faa1
MD5 51a25860c71d4c6f16d2f275a90b8584
BLAKE2b-256 7a8efd72e6b62a3a9339658fb99ac66f2176e3beb34f2e81b66aee40775b6403

See more details on using hashes here.

File details

Details for the file dedupe-1.3.8-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.3.8-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 0f61f337791ccac243b03744e5cdc0bada8ddc58a69845cb2ef3d0b85fd8f039
MD5 ec63d4a1bd22004582300a568df23546
BLAKE2b-256 51d0637bf527c24442314e36c9e72524a91965125ef0bfdc4f021e5166a8e2f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page