Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.6.2.tar.gz (47.3 kB view details)

Uploaded Source

Built Distributions

dedupe-1.6.2-cp35-cp35m-manylinux1_x86_64.whl (73.7 kB view details)

Uploaded CPython 3.5m

dedupe-1.6.2-cp35-cp35m-manylinux1_i686.whl (70.4 kB view details)

Uploaded CPython 3.5m

dedupe-1.6.2-cp35-cp35m-macosx_10_11_x86_64.whl (49.5 kB view details)

Uploaded CPython 3.5m macOS 10.11+ x86-64

dedupe-1.6.2-cp34-cp34m-win_amd64.whl (50.3 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.6.2-cp34-cp34m-win32.whl (49.5 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.6.2-cp34-cp34m-manylinux1_x86_64.whl (73.8 kB view details)

Uploaded CPython 3.4m

dedupe-1.6.2-cp34-cp34m-manylinux1_i686.whl (70.6 kB view details)

Uploaded CPython 3.4m

dedupe-1.6.2-cp27-cp27mu-manylinux1_x86_64.whl (71.5 kB view details)

Uploaded CPython 2.7mu

dedupe-1.6.2-cp27-cp27mu-manylinux1_i686.whl (68.9 kB view details)

Uploaded CPython 2.7mu

dedupe-1.6.2-cp27-cp27m-win_amd64.whl (50.4 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.6.2-cp27-cp27m-win32.whl (49.5 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.6.2-cp27-cp27m-manylinux1_x86_64.whl (71.5 kB view details)

Uploaded CPython 2.7m

dedupe-1.6.2-cp27-cp27m-manylinux1_i686.whl (68.9 kB view details)

Uploaded CPython 2.7m

dedupe-1.6.2-cp27-cp27m-macosx_10_11_x86_64.whl (49.2 kB view details)

Uploaded CPython 2.7m macOS 10.11+ x86-64

File details

Details for the file dedupe-1.6.2.tar.gz.

File metadata

  • Download URL: dedupe-1.6.2.tar.gz
  • Upload date:
  • Size: 47.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.6.2.tar.gz
Algorithm Hash digest
SHA256 094c53318a6f4d6929368370f37796c6302c71f3f4a0def6055d27626ecd952a
MD5 dc9ec33c51122ce686aabfec1f187cd7
BLAKE2b-256 ae2e4ab79457c67a53c8cac994a6e4cfef8840bbc597d8f1a463f16634676be6

See more details on using hashes here.

File details

Details for the file dedupe-1.6.2-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.2-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 302f29205701f77806314776f492a4720b802580f7428c6459683651b873d12f
MD5 7acb0be9d259bdd5bd288755c69a002a
BLAKE2b-256 a911c22da1cb1140db4237897ae58a725048a1eeffd05ee8ea5f17e4cf476523

See more details on using hashes here.

File details

Details for the file dedupe-1.6.2-cp35-cp35m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for dedupe-1.6.2-cp35-cp35m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 ab0c98ba90700ed13f6eb13dfb280c4f18d6da38d292daea6d1550fd4e716922
MD5 369b4853c7849e740c8cde48b7d7f9d8
BLAKE2b-256 280064b37a27f9b01154428dd3a0e282dfa8dc7ce5f961d27b09964be4b47b87

See more details on using hashes here.

File details

Details for the file dedupe-1.6.2-cp35-cp35m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.2-cp35-cp35m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 b342c3e5a08143c34a5b8d28058cf3d5ddcf9c0ba2503c949c65b39557db6d5f
MD5 fd62ecbb07e03b1a99c7708f33f2f357
BLAKE2b-256 9800a18571629142de9b9c95f45137d575371d66578ca3ff908e403c5207d1e0

See more details on using hashes here.

File details

Details for the file dedupe-1.6.2-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.2-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 0c37beb32344bd7edbedc328968b71f778f0c99d288cfefdd096f7b872d733cf
MD5 b7e1547a7c49a31d5efe11f210471446
BLAKE2b-256 c532cd8f2f2ed0e6487f311d612b10a1228ec28110a88b2986286506ca534519

See more details on using hashes here.

File details

Details for the file dedupe-1.6.2-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.6.2-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 f9327a7b4727b9d16fd4ca77145e441d816839c6d41ea40bc84fc355a7cc2af6
MD5 a5524e145d04e63de14fbb5128bbbfd4
BLAKE2b-256 36e14277637b492f44bcf3addc38502f0de1551357414f1daf2387e7877f8b6a

See more details on using hashes here.

File details

Details for the file dedupe-1.6.2-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.2-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 fe57b830a7e710730b9a29a03d50d3a83d2983e21114f67c93b4293a01b5b8ed
MD5 69e8e3b7f8f6801a9105f94a934c69cb
BLAKE2b-256 02c2de25ec6017f1e2479601e551a71c189da0b6bdec7c7f79643fa679b0d523

See more details on using hashes here.

File details

Details for the file dedupe-1.6.2-cp34-cp34m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for dedupe-1.6.2-cp34-cp34m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 7747f7635ab22f9c122e4925e6c80877c5b1bb5445dd83927b29606ae4f140f9
MD5 c3e794a0a60c1db4053d207e38922f62
BLAKE2b-256 2ce66f06143e23c6d63666a67e867be7ef4787af0225b087128c43efc72fd0a4

See more details on using hashes here.

File details

Details for the file dedupe-1.6.2-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.2-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 83c0a760b201d7231b04c85e0ff96b2b27759b368c8cbd792e0e3eda3e57ca61
MD5 4d40eeea66b2c727f56595b7cc5187c0
BLAKE2b-256 298977ea53cd60d967f06fd5fecff26a3fe1b811227e34b2eab8a41a11f84340

See more details on using hashes here.

File details

Details for the file dedupe-1.6.2-cp27-cp27mu-manylinux1_i686.whl.

File metadata

File hashes

Hashes for dedupe-1.6.2-cp27-cp27mu-manylinux1_i686.whl
Algorithm Hash digest
SHA256 2389f6c62bf1b9818e5a085e0bf924479959a3205ea2df5fbf94bfd79902b574
MD5 27bea12ce87f551e04a801fa27d6f31f
BLAKE2b-256 632b7f7f3344459b2c6fd9f15f297588226a17d6dc82c3adc3b994953ff111c8

See more details on using hashes here.

File details

Details for the file dedupe-1.6.2-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.2-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 4b0b04acd1caa58eb77677b8cab3babe2002d7a93464d9d762aca70bb6e07183
MD5 dee5bfc4bec081b14f5f3f1804faa733
BLAKE2b-256 1898ccb0eef5c9266b037ec44f9a5376b5355ae57666293fb2e61ae3de8fcab0

See more details on using hashes here.

File details

Details for the file dedupe-1.6.2-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.6.2-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 18d59128d16601ef2a44868e5d93a7ac4e4f149389e910dcfd53bf2b97c790f9
MD5 0bab42c3822ca7cafebf0e7c6640d4d5
BLAKE2b-256 387a4f1861f58bbca6f93d60a8c7e8c061b90a9bd5d8f6e1281e0573d0706999

See more details on using hashes here.

File details

Details for the file dedupe-1.6.2-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.2-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 ff9d0104206afe8b5869d47e4b1c91ea82f6be3987196c0d49613456d9a969cf
MD5 6d1ed069399d56a1aa9e6d8febd36937
BLAKE2b-256 203048404b6930d443bb15754414a846b824382503a5645a9d02f121d724ef80

See more details on using hashes here.

File details

Details for the file dedupe-1.6.2-cp27-cp27m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for dedupe-1.6.2-cp27-cp27m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 bc12203b3c7e1dcdd6fc6bb9ea49d8e85af2bacbeb3e2cb62d83b8e46d34c012
MD5 60637497e9ac6a68cffbdc386733728d
BLAKE2b-256 2eb71e0b678a23bdddb1246b151771f75a648cff872e9f6537b715244b483d71

See more details on using hashes here.

File details

Details for the file dedupe-1.6.2-cp27-cp27m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.2-cp27-cp27m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 2bf238b1e4b86f8655115be51a949fba6def6230d9fc753386391fc5024ccdf8
MD5 78fc5248cba6fe9ad413b5899cf3b587
BLAKE2b-256 4371bbfee3b0687a327aa19a6b9bd2de2744287771fafb7b88e4ebf52f559543

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page