Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

This version

1.3.7

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.3.7.tar.gz (46.0 kB view details)

Uploaded Source

Built Distributions

dedupe-1.3.7.linux-x86_64.tar.gz (104.3 kB view details)

Uploaded Source

dedupe-1.3.7-cp35-cp35m-macosx_10_9_x86_64.whl (47.9 kB view details)

Uploaded CPython 3.5m macOS 10.9+ x86-64

dedupe-1.3.7-cp34-cp34m-win32.whl (48.5 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.3.7-cp27-cp27m-win32.whl (48.5 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.3.7-cp27-cp27m-macosx_10_9_x86_64.whl (47.6 kB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file dedupe-1.3.7.tar.gz.

File metadata

  • Download URL: dedupe-1.3.7.tar.gz
  • Upload date:
  • Size: 46.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.3.7.tar.gz
Algorithm Hash digest
SHA256 86aa6130477729267396a814b5139fe1c1fa6229e1af0228bf93120e8fe1aaa8
MD5 f7ca89d236591dd75aecda8801271f2c
BLAKE2b-256 5a5beee58fc24871543f650ca68fbd5d837733d7ef45f2caa18c1bb6de078817

See more details on using hashes here.

File details

Details for the file dedupe-1.3.7.linux-x86_64.tar.gz.

File metadata

File hashes

Hashes for dedupe-1.3.7.linux-x86_64.tar.gz
Algorithm Hash digest
SHA256 63b90a6c2d5f471359899e2699a3241cb8300fdd590f0c92685e7439a9ce9b9e
MD5 07ed6fd47f35d846194df0ed54ccd0ce
BLAKE2b-256 cd33c478c5db8dff24fe009acdfc453160aa14e0535bc29f99619e4fc4712ee8

See more details on using hashes here.

File details

Details for the file dedupe-1.3.7-cp35-cp35m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.3.7-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 b57dae224e8095a5033d4c41f50ecb1c3622d7e993a194ba45d7e5c7706705c0
MD5 b6955c6a128c9be92a8167e9d2073d9b
BLAKE2b-256 d91ec3f826432af38308e5a222d436d18a41b89c975f534604aecbe511274af3

See more details on using hashes here.

File details

Details for the file dedupe-1.3.7-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.3.7-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 bebb72645572fede909b99a08ed30aa07ea80759c392c0b797ecacdc9c8218d7
MD5 be55ad6bed26a4dfbf7fc7b18ef31e87
BLAKE2b-256 04baebfe8a48a6dd72ca4b5e4ea90eb5c4117707e7f6988654c4c0190471ef4e

See more details on using hashes here.

File details

Details for the file dedupe-1.3.7-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.3.7-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 a0465969d71e45fd906ffae1c051cf1a1a830a715e97e77ec741ce6571a18943
MD5 a58baa8961d99280f2d97183a22affef
BLAKE2b-256 c529d3dc73e0a7689e6de612c6d1959ee49e7bf8093337ea748fbd8951f8be91

See more details on using hashes here.

File details

Details for the file dedupe-1.3.7-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.3.7-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 a0a0762230d0503b3c00538e9c6bdc74a86d5982b2ba324500e3cdffbc22186d
MD5 3687293bb54f387aadf7e63da3706407
BLAKE2b-256 019f2515768f0516a4bc85230e9acee6a1f425a82bce7b5e35284e4df2f393f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page