Persidio Anonymizer package - replaces analyzed text with desired values.
Project description
Presidio anonymizer
Description
The Presidio anonymizer is a Python based module for anonymizing detected PII text entities with desired values.
Deploy Presidio anonymizer to Azure
Use the following button to deploy presidio anonymizer to your Azure subscription.
Anonymizer
Presidio anonymizer comes by default with the following anonymizers:
-
Replace - replaces the PII with desired value
Parameters:
new_value
- replaces existing text with the given value.If
new_value
is not supplied or empty, default behavior will be: <entity_type> e.g: <PHONE_NUMBER> -
Redact - removes the PII completely from text Parameters: None
-
Hash - hash the PII using either sha256, sha512 or md5. Parameters:
hash_type
- sets the type of hashing. can be either sha256, sha512 or md5. The default hash type is sha256.
-
Mask - replaces the PII with a given character.
Parameters:
chars_to_mask
- the amount of characters out of the PII that should be replaced.masking_char
- the character to be replaced with.from_end
- Whether to mask the PII from it's end.
-
Encrypt - replaced the PII with an encrypted text.
-
Custom - replace the PII with the result of the function executed on the PII
Parameters:
lambda
- lambda to execute on the PII dataThe lambda return type must be a string
Anonymizer currently uses Advanced Encryption Standard (AES) as the encryption algorithm, also known as Rijndael.
Parameters:
- `key` - a cryptographic key used for the encryption.
The length of the key needs to be of 128, 192 or 256 bits, in a string format.
Please notice: if default value is not stated in anonymizers object, the default anonymizer is "replace" for all entities. The replacing value will be the entity type e.g.: <PHONE_NUMBER>
As the input text could potentially have overlapping PII entities, there are different anonymization scenarios:
- No overlap (single PII) - single PII over text entity, uses a given or default anonymizer to anonymize and replace the PII text entity.
- Full overlap of PIIs - When one text have several PIIs, the PII with the higher score will be taken. Between PIIs with identical scores, the selection will be arbitrary.
- One PII is contained in another - anonymizer will use the PII with larger text.
- Partial intersection - both will be returned concatenated.
Example of how each scenario would work. Our text will be:
My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: 03-232323.
- No overlaps - only Inigo was recognized as NAME: My name is Montoya. You Killed my Father. Prepare to die. BTW my number is: 03-232323.
- Full overlap - the number was recognized as PHONE_NUMBER with score of 0.7 and as SSN with score of 0.6, we will take the higher score: My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: < PHONE_NUMBER>
- One PII is contained is another - Inigo was recognized as FIRST_NAME and Inigo Montoya was recognized as NAME, we will take the larger one: My name is . You Killed my Father. Prepare to die. BTW my number is: 03-232323.
- Partial intersection - the number 03-2323 is recognized as a PHONE_NUMBER but 232323 is recognized as SSN: My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: < PHONE_NUMBER>.
Deanonymizer
Presidio deanonymize currently contains one operator:
-
Decrypt - replaced the encrypted text with decrypted text. Uses Advanced Encryption Standard (AES) as the encryption algorithm, also known as Rijndael.
Parameters:
key
- a cryptographic key used for the encryption. The length of the key needs to be of 128, 192 or 256 bits, in a string format.
Please notice: you can use "DEFAULT" as an operator key to define an operator over all entities.
Installation
As package:
To get started with Presidio-anonymizer, run the following:
pip install presidio-anonymizer
Getting started
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities.engine import RecognizerResult, OperatorConfig
# Initialize the engine with logger.
engine = AnonymizerEngine()
# Invoke the anonymize function with the text, analyzer results and
# Operators to define the anonymization type.
result = engine.anonymize(
text="My name is Bond, James Bond",
analyzer_results=[RecognizerResult("PERSON", 11, 15, 0.8),
RecognizerResult("PERSON", 17, 27, 0.8)],
operators={"PERSON": OperatorConfig("replace", {"new_value": "BIP"})}
)
print(result)
This example take the output of the AnonymizerEngine with encrypted PII entities, and decrypt it back to the original text:
from presidio_anonymizer import DeanonymizeEngine
from presidio_anonymizer.entities.engine import AnonymizerResult, OperatorConfig
# Initialize the engine with logger.
engine = DeanonymizeEngine()
# Invoke the deanonymize function with the text, anonymizer results and
# Operators to define the deanonymization type.
result = engine.deanonymize(
text="My name is S184CMt9Drj7QaKQ21JTrpYzghnboTF9pn/neN8JME0=",
entities=[AnonymizerResult(start=11, end=55, entity_type="PERSON"),],
operators={"DEFAULT": OperatorConfig("decrypt", {"key": "WmZq4t7w!z%C&F)J"})}
)
print(result)
As docker service:
In folder presidio/presidio-anonymizer run:
docker-compose up -d
HTTP API
Follow the API Spec for the Anonymizer REST API reference details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file presidio_anonymizer-2.2.2-py3-none-any.whl
.
File metadata
- Download URL: presidio_anonymizer-2.2.2-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee2a142f5462d8acb1a499d37b839b8ff45698ec80533c401d9058ad1f38a49a |
|
MD5 | 93fb08d654f7c66bb123827385101b4a |
|
BLAKE2b-256 | 3b86d45f0cc22592b3f6f4cb8cd24c89983cb664355692a2742ded08f9917187 |