Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data.

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Important links:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe-1.4.3.tar.gz (46.9 kB view details)

Uploaded Source

Built Distributions

dedupe-1.4.3-cp35-cp35m-macosx_10_9_x86_64.whl (49.7 kB view details)

Uploaded CPython 3.5m macOS 10.9+ x86-64

dedupe-1.4.3-cp34-cp34m-win_amd64.whl (50.5 kB view details)

Uploaded CPython 3.4m Windows x86-64

dedupe-1.4.3-cp34-cp34m-win32.whl (49.8 kB view details)

Uploaded CPython 3.4m Windows x86

dedupe-1.4.3-cp27-cp27m-win_amd64.whl (50.6 kB view details)

Uploaded CPython 2.7m Windows x86-64

dedupe-1.4.3-cp27-cp27m-win32.whl (49.7 kB view details)

Uploaded CPython 2.7m Windows x86

dedupe-1.4.3-cp27-cp27m-macosx_10_9_x86_64.whl (49.4 kB view details)

Uploaded CPython 2.7m macOS 10.9+ x86-64

File details

Details for the file dedupe-1.4.3.tar.gz.

File metadata

  • Download URL: dedupe-1.4.3.tar.gz
  • Upload date:
  • Size: 46.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dedupe-1.4.3.tar.gz
Algorithm Hash digest
SHA256 e3c7f968596d959e292ca7874e317daa4f813740ea4e99391bc25c206bef44d5
MD5 604f221ebfb21357547146a20a2ebf7a
BLAKE2b-256 bb041e81ead9ba7ef6dcd4ebdd92be3938af11014148d7f6f27138f8421f1c09

See more details on using hashes here.

File details

Details for the file dedupe-1.4.3-cp35-cp35m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.3-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1ea5283a24837da18ab7ada2fbb887ef91e5c70d8cea18ba3bde6e30a316956c
MD5 be48c50d0f8ae3c2649c8992c59252ea
BLAKE2b-256 b905d8c48adf0844d8043e7494d4d966697fbd1e0b6026acc326f3a0cf913377

See more details on using hashes here.

File details

Details for the file dedupe-1.4.3-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.3-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 9db666cc4c8159f234a3def2b39e8e1a62a411d810c889fba09c4d32a1fa273a
MD5 670cc55af2c4901a35f39d132ed42475
BLAKE2b-256 00569c7ad57b9f4440f14002aa1984f9a32571d94ef1bf0bc8fd442fd4dffa03

See more details on using hashes here.

File details

Details for the file dedupe-1.4.3-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.3-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 3b836fd6eac41a7698535d172ad8ee50890e568552c4b6ddaddc54e73bfa410b
MD5 fcd161819b1a4d0d9fa8f012fea79b70
BLAKE2b-256 3cc605c5f48e1987deac64de63958c3e24c065d630d22fd199da2bbdf6d22962

See more details on using hashes here.

File details

Details for the file dedupe-1.4.3-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.3-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 bcb42c095f7f1b9a2ee73286e9a4f65abe2622be2bea392eaab5f8ece9d1d841
MD5 cce43f8d3fcf69e6245dfb139f2528a7
BLAKE2b-256 f874b8f622cda721bf22cd3ff4c025a32972924f8ea1f2680e555340efa4f0ac

See more details on using hashes here.

File details

Details for the file dedupe-1.4.3-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for dedupe-1.4.3-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 1bee21a5eace3b333d5f83c0ed960a866eda067c9414a53628b9f31e28a48829
MD5 dd17c6ec9e100facc304e1aef99ece46
BLAKE2b-256 2288a85e695c095d0c8148e13b98b480e7433f7f4ae1bb773fd9638867608cb8

See more details on using hashes here.

File details

Details for the file dedupe-1.4.3-cp27-cp27m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dedupe-1.4.3-cp27-cp27m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 caac8337018a182e4867f7ed277617ac970a2427cc951ba164dfded68a8e44ae
MD5 e8aa1de1ee8476a1b20f1563278e3bf1
BLAKE2b-256 a24a2ee1714a4e371744929d488b9bbce3a48c88bc15942e354f644b8a481cc5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page