Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.5.5.tar.gz (49.0 kB view details)

Uploaded Source

Built Distributions

dedupe-1.5.5-cp35-cp35m-macosx_10_11_x86_64.whl (50.7 kB view details)

Uploaded CPython 3.5m macOS 10.11+ x86-64

dedupe-1.5.5-cp34-cp34m-win_amd64.whl (51.4 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.5.5-cp34-cp34m-win32.whl (50.7 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.5.5-cp27-cp27m-win_amd64.whl (51.5 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.5.5-cp27-cp27m-win32.whl (50.7 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.5.5-cp27-cp27m-macosx_10_11_x86_64.whl (50.3 kB view details)

Uploaded CPython 2.7m macOS 10.11+ x86-64

File details

Details for the file dedupe-1.5.5.tar.gz.

File metadata

  • Download URL: dedupe-1.5.5.tar.gz
  • Upload date:
  • Size: 49.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.5.5.tar.gz
Algorithm Hash digest
SHA256 d2f85fb9e070e0e4195392f3601f678cb77b8efe841ca53c3aa735950727b3bd
MD5 87276486646c7936c4269b6463a10097
BLAKE2b-256 cbc1d506545f8a3b135ef0bc310491e16bf99e86dcb10c24fee3741d508b33fb

See more details on using hashes here.

File details

Details for the file dedupe-1.5.5-cp35-cp35m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.5-cp35-cp35m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 ee43cf9c5f47fe14df8af1bfb4a60e59390f5fcfdf1a4daffa357b07fbe8459a
MD5 c81733ba81d0fff2f39240b2c433fe87
BLAKE2b-256 4799f42b240a6ce4efe938190e1983b39d3706e844b14d4280e8a9a8fa167a34

See more details on using hashes here.

File details

Details for the file dedupe-1.5.5-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.5-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 cedd0117c9c02866b48733131870ee8b66cd04f4d71634025a77a10884815ca4
MD5 e615a03ce849e051e68ab227a5a763b2
BLAKE2b-256 144511da3efe7634024f18957001364073869b9cf7b319716547e8a04c35b791

See more details on using hashes here.

File details

Details for the file dedupe-1.5.5-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.5.5-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 7b6e6c3e459ce28ccd5591fe9c4dc685c98b6b76b5e16647509764c998818f1b
MD5 088b60c6bc1f036a38c84ee480448e15
BLAKE2b-256 447cf600561569985a734aa58d3e9dc3a46ff05bb9e68d302888cac039f32841

See more details on using hashes here.

File details

Details for the file dedupe-1.5.5-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.5-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 c9dcd8314c34cdef13b76d7c83870c803bcb42e6de345c888879e5e143f44596
MD5 8ad682fe521117331299c41a93629ee7
BLAKE2b-256 972c6933c6fe4489b738c9eab1b17e9b826a0b2bca15fe89b1087ecd91ffe62a

See more details on using hashes here.

File details

Details for the file dedupe-1.5.5-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.5.5-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 9ae83b5b4a6f3076d131f437080f2f6112b31e9c095f4c820fd123d2fa341914
MD5 c193d842206d8de5c1471a8461feb206
BLAKE2b-256 4bdc71e5e1389770e98cac78da8dff99792c02844135aa6cd884b67cba0350d9

See more details on using hashes here.

File details

Details for the file dedupe-1.5.5-cp27-cp27m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.5.5-cp27-cp27m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 552603baee0ed6b27a9ceadafb60fb9e5217ff4161a668f16e4831179f21ee01
MD5 e5c7995e1ebb9b2f83d296dd47e403d5
BLAKE2b-256 7f6c78e8a3593aaa7632034984ab95eeb57c213c1df9e61a313ffea97e690412

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page