Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

This version

1.6.4

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.6.4.tar.gz (47.3 kB view details)

Uploaded Source

Built Distributions

dedupe-1.6.4-cp35-cp35m-manylinux1_x86_64.whl (73.6 kB view details)

Uploaded CPython 3.5m

dedupe-1.6.4-cp35-cp35m-manylinux1_i686.whl (70.4 kB view details)

Uploaded CPython 3.5m

dedupe-1.6.4-cp35-cp35m-macosx_10_11_x86_64.whl (49.5 kB view details)

Uploaded CPython 3.5m macOS 10.11+ x86-64

dedupe-1.6.4-cp34-cp34m-manylinux1_x86_64.whl (73.8 kB view details)

Uploaded CPython 3.4m

dedupe-1.6.4-cp34-cp34m-manylinux1_i686.whl (70.5 kB view details)

Uploaded CPython 3.4m

dedupe-1.6.4-cp27-cp27mu-manylinux1_x86_64.whl (71.5 kB view details)

Uploaded CPython 2.7mu

dedupe-1.6.4-cp27-cp27mu-manylinux1_i686.whl (68.9 kB view details)

Uploaded CPython 2.7mu

dedupe-1.6.4-cp27-cp27m-manylinux1_x86_64.whl (71.5 kB view details)

Uploaded CPython 2.7m

dedupe-1.6.4-cp27-cp27m-manylinux1_i686.whl (68.9 kB view details)

Uploaded CPython 2.7m

dedupe-1.6.4-cp27-cp27m-macosx_10_11_x86_64.whl (49.1 kB view details)

Uploaded CPython 2.7m macOS 10.11+ x86-64

File details

Details for the file dedupe-1.6.4.tar.gz.

File metadata

  • Download URL: dedupe-1.6.4.tar.gz
  • Upload date:
  • Size: 47.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.6.4.tar.gz
Algorithm Hash digest
SHA256 16a781f00fa8a04b0aa81c9710c2a5386684b070b73d9b282fd5d48cc8101436
MD5 6d1c3bf55df9d49b3f8c54575aa3e519
BLAKE2b-256 1dc5b6ec2521914ea6f2a60b97984c726c9499484bf4f578fcf849c10383dcda

See more details on using hashes here.

File details

Details for the file dedupe-1.6.4-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.4-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 1550923e0333ec114b12f9a2bd9c24580d1f51a74eed1533270cd2b81aaf8348
MD5 208071786e798c807cce48b8d7c8121a
BLAKE2b-256 ffa70b75162e2f0fa1ce3140f4826b15058f9b82c62540aeae825a8ec644bed9

See more details on using hashes here.

File details

Details for the file dedupe-1.6.4-cp35-cp35m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for dedupe-1.6.4-cp35-cp35m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 06327589636e9673a7dba8ef51b2ad4b3eff89b84bf90fdc6c34eab517701172
MD5 7de59b0f2e747c54045020469ecd8ba1
BLAKE2b-256 d077f74c054738bc63309411d8478b8b577c623345db5a8b5c3419bfde0a8ac4

See more details on using hashes here.

File details

Details for the file dedupe-1.6.4-cp35-cp35m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.4-cp35-cp35m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 41dd31c1ee31af457da3145392bc94fd12c49a1ba1b6561db47f4506f635554a
MD5 e8fabe67fc03eb58ce064a7009ea412a
BLAKE2b-256 1c08c2cb3def21c06a191ea46e403387467ad69ca26b6ae0cf0303325fc061f2

See more details on using hashes here.

File details

Details for the file dedupe-1.6.4-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.4-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 cbb185c906f86773e3bd7bb8086440782824a994a8e45181498a144ee3c1b8d1
MD5 f0ffe73537ea30472a05e8a5a110c075
BLAKE2b-256 ecb9d5c72f56329cf9015b8563433f2ab8c1463fd71e998aea1c69d67f81eb70

See more details on using hashes here.

File details

Details for the file dedupe-1.6.4-cp34-cp34m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for dedupe-1.6.4-cp34-cp34m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 e426087dbd74c0a94505aa77b508c89b50d4accc46b57f9d0b8fb35aa5abdc64
MD5 daaebfb51ca5712d7578ddc2bb5aa9d0
BLAKE2b-256 02016c7ea07bc9ac731f6f2325457a5b8798bf05a6001cfc39dac74365171a44

See more details on using hashes here.

File details

Details for the file dedupe-1.6.4-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.4-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d513c00a5a45066e2c933bfedc2156d6e70e9e39752d51f1aff960f4895f9f1d
MD5 b6db74f544d137369f8da9b2fe9d51ac
BLAKE2b-256 8604f4152efd47d80768c1b8f72c10118897a3e641d9899573e182cd03c40989

See more details on using hashes here.

File details

Details for the file dedupe-1.6.4-cp27-cp27mu-manylinux1_i686.whl.

File metadata

File hashes

Hashes for dedupe-1.6.4-cp27-cp27mu-manylinux1_i686.whl
Algorithm Hash digest
SHA256 4c86ea2822c66d6776a3868343e84618aabcba04992359946c646eb91e39e1ec
MD5 227c969afbe03080b24a281009e72209
BLAKE2b-256 dfd8b83e80041d4e3b4ecb558db76896a53495ead7e06872cc40edf4e1095255

See more details on using hashes here.

File details

Details for the file dedupe-1.6.4-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.4-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 bfa9ff954ddf172a0df68323e7c2394b8001033a1c1c5e197556df2d31442286
MD5 560b7693343ac10098e03d3ea9a23992
BLAKE2b-256 ad81532bd5b01b14dcbd4eadbbb47771fd4a69bbc823b23dd5dae8f0ac8fc802

See more details on using hashes here.

File details

Details for the file dedupe-1.6.4-cp27-cp27m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for dedupe-1.6.4-cp27-cp27m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 aea32d25d9b18f09a207d6365c7d81ff0560ff5348411fae0ca188811c610452
MD5 1d709dce014d8571f98136a57a42f400
BLAKE2b-256 a99b6bd376c1217287703d0efa8d86e1de3c763e3202dd21ad7f42d253eadb2f

See more details on using hashes here.

File details

Details for the file dedupe-1.6.4-cp27-cp27m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.4-cp27-cp27m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 1cdb19b296f294dce712960507fb51f75fa6f1214ac790af503905bcf8845c1a
MD5 d32ba2445e1f07777c55e6de9d55f2a0
BLAKE2b-256 a333de3c0734ac4917840023b6af910029cd5e19c63f179914850d2d898988d2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page