Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data. dedupe is the open source engine for dedupe.io

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Project details


Release history Release notifications | RSS feed

This version

2.0.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-2.0.5.tar.gz (67.5 kB view details)

Uploaded Source

Built Distributions

dedupe-2.0.5-cp38-cp38-win_amd64.whl (64.7 kB view details)

Uploaded CPython 3.8 Windows x86-64

dedupe-2.0.5-cp38-cp38-manylinux1_x86_64.whl (90.6 kB view details)

Uploaded CPython 3.8

dedupe-2.0.5-cp37-cp37m-win_amd64.whl (64.7 kB view details)

Uploaded CPython 3.7m Windows x86-64

dedupe-2.0.5-cp37-cp37m-manylinux1_x86_64.whl (90.3 kB view details)

Uploaded CPython 3.7m

dedupe-2.0.5-cp37-cp37m-macosx_10_14_x86_64.whl (60.5 kB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

dedupe-2.0.5-cp36-cp36m-win_amd64.whl (64.6 kB view details)

Uploaded CPython 3.6m Windows x86-64

dedupe-2.0.5-cp36-cp36m-manylinux1_x86_64.whl (89.1 kB view details)

Uploaded CPython 3.6m

dedupe-2.0.5-cp36-cp36m-macosx_10_14_x86_64.whl (60.4 kB view details)

Uploaded CPython 3.6m macOS 10.14+ x86-64

File details

Details for the file dedupe-2.0.5.tar.gz.

File metadata

  • Download URL: dedupe-2.0.5.tar.gz
  • Upload date:
  • Size: 67.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for dedupe-2.0.5.tar.gz
Algorithm Hash digest
SHA256 66e41e204fc99b5600e73421b1b7393a58ea72cd4fdde6ca8c737bcedf519473
MD5 d791aeaa38351134b6dbb534cc813efe
BLAKE2b-256 ec443b911a87f33e3f6147cb93b1cdd41df54906759063635dbf50322755d148

See more details on using hashes here.

File details

Details for the file dedupe-2.0.5-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.5-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 64.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for dedupe-2.0.5-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 532134903b3741bde1ae750370c42d991c7d9f6bb3676c8b63303982d0d7a67b
MD5 d7100289c2776dd1b8e10fe9b78cc56e
BLAKE2b-256 70d5f29a901bebf6540d9aef1b01f53fc5a074bb6a308d163deb5b8db9d2641f

See more details on using hashes here.

File details

Details for the file dedupe-2.0.5-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.5-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 90.6 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for dedupe-2.0.5-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 b8adb3a56b833ead66c8db4b170230f52aa3cecf2888073ecca2ab2d98b5702f
MD5 ac3320be1eba04e9ba706df7db046ad6
BLAKE2b-256 bc05c53a0aff20720c79a1ebb9d9c5e9d7aa94deb05bb8d26d6b27e42f93901e

See more details on using hashes here.

File details

Details for the file dedupe-2.0.5-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.5-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 64.7 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.9

File hashes

Hashes for dedupe-2.0.5-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 37b530507d3c871b3065b5235812f6aa9431b4b07f48348a273f30b72fcb458c
MD5 3804fa5b522372c0d170a0b82812b72e
BLAKE2b-256 dd4650a95144ce0af75f11d556f849533b032d964561f005216b76f80f58e8be

See more details on using hashes here.

File details

Details for the file dedupe-2.0.5-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.5-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 90.3 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for dedupe-2.0.5-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 6efc5d4998004088bcdbb808463de768b00999d179ddc7f3adbc59ba2157b008
MD5 b240cebfcf2886549da505a8fb683923
BLAKE2b-256 1aeea3a2399ef2ea9f8906d005499946c1bf5c65fc5c6fc3a418c182f4927858

See more details on using hashes here.

File details

Details for the file dedupe-2.0.5-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.5-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 60.5 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.9

File hashes

Hashes for dedupe-2.0.5-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 2e818b34fa3547e6abdff770f21de013ba14555e5b467d9dadcedd8f20f457ae
MD5 3164eaaf80489d2cb437f46bdcb7cdab
BLAKE2b-256 6d5b19b99ac9724f9dbd0f8aee8ff51c639a13dcf65a120b7e4ffddbc39873e9

See more details on using hashes here.

File details

Details for the file dedupe-2.0.5-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: dedupe-2.0.5-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 64.6 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.8

File hashes

Hashes for dedupe-2.0.5-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 fa31046e41944fece64dd85f0124972c47579044def93aed7d9187975bf04a90
MD5 0306f52abff7c2d0646ada0dcc0cfc13
BLAKE2b-256 7c850eefeb9aa5b9f31949886578214fa162c83d657de2584b63d7f6f2fce9a2

See more details on using hashes here.

File details

Details for the file dedupe-2.0.5-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.5-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 89.1 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for dedupe-2.0.5-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 0d37cb9caa8e87b2ad8720610db5baeb6a599208aaa65e9deec570b88c5391b1
MD5 e445650fe2447c46c9019e8bfe00423f
BLAKE2b-256 a4c68e69a99a7e6dc3c5427fd33d8361236f57829ecc54129be0fceb4b3f993b

See more details on using hashes here.

File details

Details for the file dedupe-2.0.5-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: dedupe-2.0.5-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 60.4 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.12

File hashes

Hashes for dedupe-2.0.5-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 d24d97c288ff71e90c510ad866eb4d25d0bbed3428ca8a992276735047d04076
MD5 aed42fa0409264fb5ced65292d721f6f
BLAKE2b-256 400de1e2d380f3b7e893a6b690e23835b77a8f54686c493315c4ad3160571d7a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page