Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

This version

1.6.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.6.5.tar.gz (47.3 kB view details)

Uploaded Source

Built Distributions

dedupe-1.6.5-cp35-cp35m-manylinux1_x86_64.whl (73.7 kB view details)

Uploaded CPython 3.5m

dedupe-1.6.5-cp35-cp35m-manylinux1_i686.whl (70.4 kB view details)

Uploaded CPython 3.5m

dedupe-1.6.5-cp35-cp35m-macosx_10_11_x86_64.whl (49.5 kB view details)

Uploaded CPython 3.5m macOS 10.11+ x86-64

dedupe-1.6.5-cp34-cp34m-win_amd64.whl (50.2 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.6.5-cp34-cp34m-win32.whl (49.5 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.6.5-cp34-cp34m-manylinux1_x86_64.whl (73.8 kB view details)

Uploaded CPython 3.4m

dedupe-1.6.5-cp34-cp34m-manylinux1_i686.whl (70.5 kB view details)

Uploaded CPython 3.4m

dedupe-1.6.5-cp27-cp27mu-manylinux1_x86_64.whl (71.5 kB view details)

Uploaded CPython 2.7mu

dedupe-1.6.5-cp27-cp27mu-manylinux1_i686.whl (68.9 kB view details)

Uploaded CPython 2.7mu

dedupe-1.6.5-cp27-cp27m-win_amd64.whl (50.3 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.6.5-cp27-cp27m-win32.whl (49.5 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.6.5-cp27-cp27m-manylinux1_x86_64.whl (71.5 kB view details)

Uploaded CPython 2.7m

dedupe-1.6.5-cp27-cp27m-manylinux1_i686.whl (68.9 kB view details)

Uploaded CPython 2.7m

dedupe-1.6.5-cp27-cp27m-macosx_10_11_x86_64.whl (49.1 kB view details)

Uploaded CPython 2.7m macOS 10.11+ x86-64

File details

Details for the file dedupe-1.6.5.tar.gz.

File metadata

  • Download URL: dedupe-1.6.5.tar.gz
  • Upload date:
  • Size: 47.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.6.5.tar.gz
Algorithm Hash digest
SHA256 cae7fc92501c0884c85b69b9d1281c693c7268b3b9277b38aaf507dd5d72a51a
MD5 511fd0d275a1ec593f094c6a2616d2dd
BLAKE2b-256 64f6974787b9f5eb8f22fd887a400a24bb1d3ef522a8add8831560838cde9267

See more details on using hashes here.

File details

Details for the file dedupe-1.6.5-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.5-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 cd0034ac455ee281a57a9028d58ee663d1e12ced0048eb4095caacef5c66d501
MD5 9fd9f009490632d234a5f6b4859d4fa8
BLAKE2b-256 5e66e391038a169ed5ba204172f0d7b2e52b3dbb791c5ae100a884f297e91458

See more details on using hashes here.

File details

Details for the file dedupe-1.6.5-cp35-cp35m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for dedupe-1.6.5-cp35-cp35m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 ce598a853a68442d323203b49c07f5bf12360107c0919c64ff52acde55a4155e
MD5 8dab257d3f82dad1da49e98d83f110e9
BLAKE2b-256 d05fe95ea1b77e2452d0007ba40da5bc8f078bb45832de87164f88b056b0a41a

See more details on using hashes here.

File details

Details for the file dedupe-1.6.5-cp35-cp35m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.5-cp35-cp35m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 36d5eafd0535fdaa5f58ec579c6c9e57121dca8a4a6c661607aaaa67e55ccc86
MD5 66f28c453987e5430e43db18185d3546
BLAKE2b-256 86946eebf0c200d2cffe57563ae0a32ecbc5d30d43ffa027932cf9f54df9446d

See more details on using hashes here.

File details

Details for the file dedupe-1.6.5-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.5-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 7eec6cf477a90f689920615f1fbc1962e3189897419d8f2ce86118090fba4389
MD5 97dd3b8ba6498e3d36807c9cc9002116
BLAKE2b-256 0d7b32a1de1bd98b2a333228f9b7a638aa4784aff1813ab68e1226165a8c6d06

See more details on using hashes here.

File details

Details for the file dedupe-1.6.5-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.6.5-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 acb79d95bcbfbd9718ec3203a0438b24898b5de01d52e5df0215637c77a5f29d
MD5 7429d5ecdd72be0e5727a462b04b101a
BLAKE2b-256 f7a60768852dbf9e3e221484806fe378e15d848adc775bf7719bf1f3cc490d78

See more details on using hashes here.

File details

Details for the file dedupe-1.6.5-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.5-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f6089318834610724fe959d4434ff17e5a27167bef84292321383149c06c546c
MD5 b0f24afc614c31d82b6c25597d034cf1
BLAKE2b-256 9737072b5c180789c37b72442c4d6ae5aa03ebd94f85d5d2221f64f0568bd899

See more details on using hashes here.

File details

Details for the file dedupe-1.6.5-cp34-cp34m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for dedupe-1.6.5-cp34-cp34m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 bdbbfebe758ef64ef1ba66900547e04e5d9e889bd9f817e86b55329cfbdf7e42
MD5 57d7a74f359db80d0122150c987055d8
BLAKE2b-256 86a85d118c43493c79e670f89a7ed38cc83de997fcf6316d229ce5bc3dc59d39

See more details on using hashes here.

File details

Details for the file dedupe-1.6.5-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.5-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 3603afd5653e776db290c4c9bb3b87de6ae07ff092dd3800582483a0e4b2f3ff
MD5 5c3e39b43cc8b10baa119a37cfc1b464
BLAKE2b-256 6cd663518c72a15b35a65955eccf84c56475e4753a94d029f0a1977ea84ad64c

See more details on using hashes here.

File details

Details for the file dedupe-1.6.5-cp27-cp27mu-manylinux1_i686.whl.

File metadata

File hashes

Hashes for dedupe-1.6.5-cp27-cp27mu-manylinux1_i686.whl
Algorithm Hash digest
SHA256 0410b4c0768cf22710d08985f783456f486b460633341022b65b6a13bb860986
MD5 a582c343e1dfb51a6a81b6d61de28460
BLAKE2b-256 2957d38a9deb24bfe8fc6e49b5d45ef84cd2d59ab2bc721ece1da84e106d3fef

See more details on using hashes here.

File details

Details for the file dedupe-1.6.5-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.5-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 1b37d31efdf1f50ab5f454347d37fd013da4974ddcba8e91959002e9f1160ac2
MD5 f0b897be66640f07c0442bebf9646c54
BLAKE2b-256 93d7d61afc52ca21045118ea6a7a1ee12a91b8e7698e5bd92e922505481cc25b

See more details on using hashes here.

File details

Details for the file dedupe-1.6.5-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.6.5-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 584cc3926c0936499f5afc7a4eb09b28f60a4e35a96cb74616ab235be4500949
MD5 f1c6419137c0a7757f42452ddef63f1d
BLAKE2b-256 d68d8dd63aba00ed2e82e6ca06cc1758ccc0af57e294d6380c777e2623a3b56d

See more details on using hashes here.

File details

Details for the file dedupe-1.6.5-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.5-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d25c355373dc581c6f7317d9f943809b837e71ad1fbc3b4e0b72c6eb93e88b28
MD5 34055dd751efc26567a581007438dba1
BLAKE2b-256 b000dd6a9ada04ba2beec9661cee653fab34a9cbff923d522e1b744f37a67b61

See more details on using hashes here.

File details

Details for the file dedupe-1.6.5-cp27-cp27m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for dedupe-1.6.5-cp27-cp27m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 b432ae0e85ba5eaa5da6f374af54f1b62e568e4bf2cbc7fde7299f6c42d59966
MD5 2aeb5a2fe6664f544d96a96e2ac29f53
BLAKE2b-256 b707ff09360e4af827eef283bc54e275b85c8cd44a9f593c1e661adcf07d7937

See more details on using hashes here.

File details

Details for the file dedupe-1.6.5-cp27-cp27m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.5-cp27-cp27m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 3e9aff6f86d693467ca2c5561b9b8a5073d46273e53ec260efb52baba84bdbb3
MD5 c7f86578b03abf26f0c24bc312d1e620
BLAKE2b-256 9235b1a87192fd4868ee0ff804776680dbbc83df2b643f625cfabcbdc9bcce04

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page