Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.4.0.tar.gz (46.9 kB view details)

Uploaded Source

Built Distributions

dedupe-1.4.0-cp35-cp35m-macosx_10_9_x86_64.whl (48.6 kB view details)

Uploaded CPython 3.5m macOS 10.9+ x86-64

dedupe-1.4.0-cp34-cp34m-win_amd64.whl (49.7 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.4.0-cp34-cp34m-win32.whl (49.2 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.4.0-cp34-cp34m-manylinux1_x86_64.whl (73.6 kB view details)

Uploaded CPython 3.4m

dedupe-1.4.0-cp27-cp27mu-manylinux1_x86_64.whl (71.3 kB view details)

Uploaded CPython 2.7mu

dedupe-1.4.0-cp27-cp27m-win_amd64.whl (49.8 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.4.0-cp27-cp27m-win32.whl (49.2 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.4.0-cp27-cp27m-manylinux1_x86_64.whl (71.3 kB view details)

Uploaded CPython 2.7m

dedupe-1.4.0-cp27-cp27m-macosx_10_9_x86_64.whl (48.3 kB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file dedupe-1.4.0.tar.gz.

File metadata

  • Download URL: dedupe-1.4.0.tar.gz
  • Upload date:
  • Size: 46.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.4.0.tar.gz
Algorithm Hash digest
SHA256 e48efa14215416cbc68d5639c3b78abb6bce719a5f34e4601c2316bc5b3aeff8
MD5 14dc80abed0bce4898c3185d437a1c9f
BLAKE2b-256 d37819ef009c31e2ff6aa1324cff037cdade19904af3e0cec1b2014d59814a35

See more details on using hashes here.

File details

Details for the file dedupe-1.4.0-cp35-cp35m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.0-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 b56c7f3b8e4dbee08bc40715c73b9b39e1719d82ebdc1ae0ef08f6aa3424b4a8
MD5 90db4fa4eff8491cb6ba6820cc89cece
BLAKE2b-256 0a16c0c707613a61e5230d38db648d2b81ee4f94455b6a2525468f3d24545b5a

See more details on using hashes here.

File details

Details for the file dedupe-1.4.0-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.0-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 e206bcb08c101b93454b1ce9c38b821bfb117c3ffc6d5381a9e142adb4b8bc7f
MD5 4c50d32c72ee0872a335402f80a2defc
BLAKE2b-256 032950949b1d6ae1771cfca4dc0dd9816478e4a755a17da176b5368ef7e04ade

See more details on using hashes here.

File details

Details for the file dedupe-1.4.0-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.0-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 08e0933b259866278e9c9e575fe422268a337c2d7a01e5b06d92445b4ec5f8ee
MD5 acd53bf12018a9396f639c355f272609
BLAKE2b-256 0276a6d30ae1b918a9c6b0b75b853c8d50f894824ddc82de13dc40e37fde6030

See more details on using hashes here.

File details

Details for the file dedupe-1.4.0-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.0-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 dd1e3078a31c4793da24f6b94f728d981b0afd342e337d2513f093f8a4f9ca07
MD5 47c50833419db2d667f67eeda7646568
BLAKE2b-256 548c9593f6c879b030c7a85a8e819956a7544c7d42b53305258ac90341ee59dc

See more details on using hashes here.

File details

Details for the file dedupe-1.4.0-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.0-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 cb36758bbede9b86415acb02bd3534580be58157d11c6b10962a8e8075d1f7d9
MD5 12d7e1e090e90b4570809f806f18bb32
BLAKE2b-256 16f60b34554ab3d0b73b6d158987aa7c69fa7743b020a75f8d358adc8997c359

See more details on using hashes here.

File details

Details for the file dedupe-1.4.0-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.0-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 7ae926d7d03e28047d5383e29c9b35756ddeb96ce3a204c6b10c2ee0fada74a2
MD5 c5df5364cdcc1a27f32fa3d32c67c8e1
BLAKE2b-256 61855e20fee74bae0a60ad4f0defaef27a4339d1a774bdcd4f445bafc0d3834a

See more details on using hashes here.

File details

Details for the file dedupe-1.4.0-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.0-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 9792c39727be2472fc6752400e7389430e0331e91bca98b28683257ac4763fe7
MD5 be8abb5775e95f13fee1901c39d82fc4
BLAKE2b-256 6d9b3ce805bac7dd457c44fe5b48171a3fa5a5330ab52fb5805864562ea24d70

See more details on using hashes here.

File details

Details for the file dedupe-1.4.0-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.0-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 e205be858c97dda12fe938b89853dfcf3ec323103b580c06715c78f16cd76f61
MD5 3e6b14f266b943b0f63e64827b01b054
BLAKE2b-256 f8d36f715ac2184185d8bfb1500a09843e6a92ed5349750d3c8da5c661008d7c

See more details on using hashes here.

File details

Details for the file dedupe-1.4.0-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.0-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 2ee0825ebe5dc2d8890826586d1d593a5cc877eac71c51ad0969515049ee0c11
MD5 2138200e9695a7b40433d26a8748bbb1
BLAKE2b-256 4032c66eaa00cab38d2bee15835a425d08500dff25c9619bae72ac27388f0af1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page