A library to generate entity fingerprints.
Project description
fingerprints
This library helps with the generation of fingerprints for entity data. A fingerprint in this context is understood as a simplified entity identifier, derived from it's name or address and used for cross-referencing of entity across different datasets.
Usage
import fingerprints
fp = fingerprints.generate('Mr. Sherlock Holmes')
assert fp == 'holmes sherlock'
fp = fingerprints.generate('Siemens Aktiengesellschaft')
assert fp == 'ag siemens'
fp = fingerprints.generate('New York, New York')
assert fp == 'new york'
Company type names
A significant part of what fingerprints
does it to recognize company legal form
names. For example, fingerprints
will be able to simplify Общество с ограниченной ответственностью
to ООО
, or Aktiengesellschaft
to AG
. The required database
is based on two different sources:
- A Google Spreadsheet created by OCCRP.
- The ISO 20275: Entity Legal Forms Code List
Wikipedia also maintains an index of types of business entity.
See also
- Clustering in Depth, part of the OpenRefine documentation discussing how to create collisions in data clustering.
- probablepeople, parser for western names made by the brilliant folks at datamade.us.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fingerprints-1.2.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02313071e8e5e42025d4a6e9b2ec36fc156012776b71c71eeb359caacd5881b0 |
|
MD5 | 06899b3d89ee2a104df126a367dbe91f |
|
BLAKE2b-256 | f90211200c1312b5ceb6b4176c8d6a655af7fb6ee6339e7340d79da1e2364888 |