Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

This version

1.6.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.6.0.tar.gz (47.1 kB view details)

Uploaded Source

Built Distributions

dedupe-1.6.0-cp35-cp35m-macosx_10_11_x86_64.whl (49.3 kB view details)

Uploaded CPython 3.5m macOS 10.11+ x86-64

dedupe-1.6.0-cp34-cp34m-win_amd64.whl (50.0 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.6.0-cp34-cp34m-win32.whl (49.3 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.6.0-cp27-cp27m-macosx_10_11_x86_64.whl (48.9 kB view details)

Uploaded CPython 2.7m macOS 10.11+ x86-64

File details

Details for the file dedupe-1.6.0.tar.gz.

File metadata

  • Download URL: dedupe-1.6.0.tar.gz
  • Upload date:
  • Size: 47.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.6.0.tar.gz
Algorithm Hash digest
SHA256 078f8653f4e29f26a549bfccccd6bd4d95d3d1e9d7e009f60f3479ec6274b82d
MD5 05e54fd3e568627cdfc3efebd2771a02
BLAKE2b-256 90c511f2a3fbce6c9b104e061cc598556579c8046b4bc916635d6332ce165928

See more details on using hashes here.

File details

Details for the file dedupe-1.6.0-cp35-cp35m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.0-cp35-cp35m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 ea20c4902c9a81445acaa1fe0a44611771b22281ce2dc7ea7eccfdec2d8693f8
MD5 5762a644aacaf12334788f747a89ccb1
BLAKE2b-256 fc0934eda43993db5187c77ae54277f0ab1e69f6a4b432e0aafa03a7eadf9c55

See more details on using hashes here.

File details

Details for the file dedupe-1.6.0-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.0-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 5e93ac56377c7c817ef79143aa2ba6dad26c1d879d2ed6b936113cff018c7614
MD5 ae4a11d61008a35bb9ab1c512b503098
BLAKE2b-256 ea26e5d46f51732385087ccb4fc582802c76561b1916bcb34e16f1753c959505

See more details on using hashes here.

File details

Details for the file dedupe-1.6.0-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.6.0-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 8aaff1ad5462c2f0fa35513eb35f999e32b0d6910e2241f7ca9e860c925984c8
MD5 5d362acee9d3d976171f3dd4f2119a81
BLAKE2b-256 87bdb85b28621eac70e6c393b8b5de565b5c816c38e2b97a926943820742ebaf

See more details on using hashes here.

File details

Details for the file dedupe-1.6.0-cp27-cp27m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.6.0-cp27-cp27m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 753bc3275310e1df43fe0b9c636d2b354818c41f225d6f65cbc117a79476cb20
MD5 d2e56085c577bf619b29d5e49588d99c
BLAKE2b-256 1e4c0e942fcd566941aae87f7bdafb9c426fd8070eca81174b7428e96fbda600

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page