Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.4.2.tar.gz (46.9 kB view details)

Uploaded Source

Built Distributions

dedupe-1.4.2-cp35-cp35m-macosx_10_9_x86_64.whl (49.7 kB view details)

Uploaded CPython 3.5m macOS 10.9+ x86-64

dedupe-1.4.2-cp34-cp34m-win_amd64.whl (50.5 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.4.2-cp34-cp34m-win32.whl (49.8 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.4.2-cp27-cp27m-win_amd64.whl (50.6 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.4.2-cp27-cp27m-win32.whl (49.7 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.4.2-cp27-cp27m-macosx_10_9_x86_64.whl (49.4 kB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file dedupe-1.4.2.tar.gz.

File metadata

  • Download URL: dedupe-1.4.2.tar.gz
  • Upload date:
  • Size: 46.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.4.2.tar.gz
Algorithm Hash digest
SHA256 7c2bd4c76f31b9afc0c72959473944b3183bd381f0094cf3c89fc8bbfdd258a9
MD5 0cb74c45f7de26744640c6fbf53487c2
BLAKE2b-256 b7bf661847aa076468bd261fa4f3dddf6763d1a7789088dbd1a53ecd33eced45

See more details on using hashes here.

File details

Details for the file dedupe-1.4.2-cp35-cp35m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.2-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c6820c6beac391d778ca80cf0d04d54c6ce400ce21fb22a49c8c817a87de0520
MD5 5dc588593cd80ace74400e5eac161f8f
BLAKE2b-256 f67047f332eb136bd5abcccc5cc1833cbfa521f384a6ede3a55c79281382c413

See more details on using hashes here.

File details

Details for the file dedupe-1.4.2-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.2-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 f956fe45a5095bdaf4720eb4fe397a17dc70aabb4a251f1a59b6331b1195ecaa
MD5 44dd6bcf10343008ff47e3d643f01232
BLAKE2b-256 3bd46f75749eeae57cdcf376b4f0b44b045cee770033a006c3843972cb59de3c

See more details on using hashes here.

File details

Details for the file dedupe-1.4.2-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.2-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 0591d2a18340a279108118f4a10d6e4a09af4c17789fee5878407eb73d18d49d
MD5 adc974c22fb9d5c0fa010e30d6134d9c
BLAKE2b-256 7a84826284b16a2c029711a1c33b70c90c927ce8d3ed2908e839212208e90563

See more details on using hashes here.

File details

Details for the file dedupe-1.4.2-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.2-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 441db90c4cfe47865f10ea9fb1f394612e1b61986005673f53c135a95b24005f
MD5 316d2222444542b93e08df7eeed6c693
BLAKE2b-256 82ae6a6589ca2bb64c0db98699ebb2e0c5b9ad0097c95ec13d8122899269ea3a

See more details on using hashes here.

File details

Details for the file dedupe-1.4.2-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.2-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 2499849dd2734330b9f07afd9ea305959bf522368a476b7ca3282f0b7e24dacb
MD5 0bd1bdf3d5733086a5f1bc217e6ecdc0
BLAKE2b-256 2e1b2d6a065dd5ae5deee74d6a52eca521259b727a1c446ac5d2764be72afa59

See more details on using hashes here.

File details

Details for the file dedupe-1.4.2-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.2-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 df7cc0b6d8953dd39ad209b5c8b4e7ed6c39ddc5416798608220514b58abdc7e
MD5 ddc7cd391e279a2c5735c325d6f9490b
BLAKE2b-256 a45ed325dec13f80a5e88fecbc2a411dcb49ac0dbdfe684e079df56e322fe1a7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page