Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.5.6.tar.gz (49.0 kB view details)

Uploaded Source

Built Distributions

dedupe-1.5.6-cp35-cp35m-macosx_10_11_x86_64.whl (50.7 kB view details)

Uploaded CPython 3.5m macOS 10.11+ x86-64

dedupe-1.5.6-cp34-cp34m-win_amd64.whl (51.4 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.5.6-cp34-cp34m-win32.whl (50.7 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.5.6-cp27-cp27m-win_amd64.whl (51.5 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.5.6-cp27-cp27m-win32.whl (50.7 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.5.6-cp27-cp27m-macosx_10_11_x86_64.whl (50.3 kB view details)

Uploaded CPython 2.7m macOS 10.11+ x86-64

File details

Details for the file dedupe-1.5.6.tar.gz.

File metadata

  • Download URL: dedupe-1.5.6.tar.gz
  • Upload date:
  • Size: 49.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.5.6.tar.gz
Algorithm Hash digest
SHA256 08c7bbf825f4d3e4374a576c77804b474e66eacdbcf4bbb9fec4e376a31e37a5
MD5 af4e0c76f6aa42b8ff7d1fe9e86cf8f8
BLAKE2b-256 90f1853f75bff5ca83d7b0f96484b9791ec3def75df32d25d6ba2e78aad7ac91

See more details on using hashes here.

File details

Details for the file dedupe-1.5.6-cp35-cp35m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.6-cp35-cp35m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 a70bb9dde4d8caafeac36c6bb9ff6a568f7391f492ef26f4fc8dd8d724a17982
MD5 3a2412155b3abc76de2a8ef44f5a2d68
BLAKE2b-256 69928c6bb409f166506b719db769f758cd2132d5f0e5a400424d3e3d7873eb15

See more details on using hashes here.

File details

Details for the file dedupe-1.5.6-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.6-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 ae520887c5ce2b666d6020547c6f058a10de46e9265092456adcf20d7492171d
MD5 ab28b8101f828aefe398b9757e2e91d0
BLAKE2b-256 76fe9eb7c32e6d1e7a56881f5a2f4862173c19e274d7e7743bbd11fc592bf943

See more details on using hashes here.

File details

Details for the file dedupe-1.5.6-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.5.6-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 b0d8887d735835fd0f554015f9c8539a66b00ee175da643ee29a69484a0685f8
MD5 2e1b8cdc0a316a13c58a812ffa011838
BLAKE2b-256 e9a6b63e71fc35eca68a680dd536f6148818ff71676ebc6d21a24e0616a19341

See more details on using hashes here.

File details

Details for the file dedupe-1.5.6-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.6-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 9ad77f18424a8f6e594b0c3c3989fb32567d11b626c3846a7c939ceaaa3d6995
MD5 7795af547feda3f3510ae639c5e7a1d0
BLAKE2b-256 1a10db349aa91ce7b1e83472cc06298deb458dee8efd1e0b044474c99184f422

See more details on using hashes here.

File details

Details for the file dedupe-1.5.6-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.5.6-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 4ac46eb20e006538ce50245a440b1aabd9f4fe78894866c4029f7d48474aa41a
MD5 f5b34fce91a75de227eda3534b7e9c7e
BLAKE2b-256 dc235909382fe17dea6d330533e8ccda9d26fe02364ae5d509f29e29cced0074

See more details on using hashes here.

File details

Details for the file dedupe-1.5.6-cp27-cp27m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.6-cp27-cp27m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 9c82f7a0912c6f9c1677382e683454970189b78b10930c08ab1933e253f69a00
MD5 cbc8be4eb73f2db2f06974c244afda22
BLAKE2b-256 6f00e8c045e7ae74b1eb5fc4f641ec635882852bcce3fef927624866190f4052

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page