Skip to main content

Make record linkages in followthemoney data.

Project description

nomenklatura

Nomenklatura de-duplicates and integrates different Follow the Money entities. It serves to clean up messy data and to find links between different datasets.

Design

This package will offer an implementation of an in-memory data deduplication framework centered around the FtM data model. The idea is the following workflow:

  • Accept FtM-shaped entities from a given loader (e.g. a JSON file, or a database)
  • Build an in-memory inverted index of the entities for blocking
  • Generate merge candidates using the blocking index and FtM compare
  • Provide a file-based storage format for merge challenges and decisions
  • Provide a text-based user interface to let users make merge decisions

Later on, the following might be added:

  • A web application to let users make merge decisions on the web
  • An implementation of the OpenRefine Reconciliation API based on the blocking index

This will be done in typed Python 3.

Reading

Contact, contributions etc.

This codebase is licensed under the terms of an MIT license (see LICENSE).

We're keen for any contributions, bug fixes and feature suggestions, please use the GitHub issue tracker for this repository.

Nomenklatura is currently developed thanks to a Prototypefund grant for OpenSanctions. Previous iterations of the package were developed with support from Knight-Mozilla OpenNews and the Open Knowledge Foundation Labs.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nomenklatura-0.1.0.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

nomenklatura-0.1.0-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file nomenklatura-0.1.0.tar.gz.

File metadata

  • Download URL: nomenklatura-0.1.0.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for nomenklatura-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7b9e89494f91eb005debb82c0758d34c1cf6e17819b8141eee888a964a57458d
MD5 d59b289531e5ba07e92b4fa4ae6a635a
BLAKE2b-256 e5192e7d3c5a3b84b4c199f1e818c3167ef83ca2c8b9d7360b760b5b657016e9

See more details on using hashes here.

File details

Details for the file nomenklatura-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nomenklatura-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for nomenklatura-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f9ab31b96820fe9731017814c3a2501dad130b99c249210a2b998780806d4d81
MD5 640f01cb9bdc168e953de433cc89af00
BLAKE2b-256 1982c2b1318668252388947ea6acdc86b78521b89ab6d626c92934ec9fe3f8dd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page