Skip to main content

Persidio Anonymizer package - replaces analyzed text with desired values.

Project description

Presidio anonymizer

Description

The Presidio anonymizer is a Python based module for anonymizing detected PII text entities with desired values.

Deploy Presidio anonymizer to Azure

Use the following button to deploy presidio anonymizer to your Azure subscription.

Deploy to Azure

TODO: change this link to main branch once merged (#2765).

Anonymizer

Presidio anonymizer comes by default with the following anonymizers:

  • Replace - replaces the PII with desired value
    Parameters: "new_value" - replaces existing text with the given value.
    If "new_value" is not supplied or empty, default behavior will be: <entity_type> e.g: <PHONE_NUMBER>
  • Redact - removes the PII completely from text Parameters: None
  • Hash - hash the PII using either sha256, sha512 or md5. Parameters:
    • "hash_type" - sets the type of hashing. can be either sha256, sha512 or md5. The default hash type is sha256.
  • FPE - using ff1 algorithm for formatting-Preserving Encryption on the PII
  • Mask - replaces the PII with a given character.
    Parameters:
    • "chars_to_mask" - the amount of characters out of the PII that should be replaced.
    • "masking_char" - the character to be replaced with.
    • "from_end" - Whether to mask the PII from it's end.

Please notice: if default value is not stated in transformations object, the default anonymizer is "replace" for all entities. The replacing value will be the entity type e.g.: <PHONE_NUMBER>

As the input text could potentially have overlapping PII entities, there are different anonymization scenarios:

  • No overlap (single PII) - single PII over text entity, uses a given or default transformation to anonymize and replace the PII text entity.
  • Full overlap of PIIs - When one text have several PIIs, the PII with the higher score will be taken. Between PIIs with identical scores, the selection will be arbitrary.
  • One PII is contained in another - anonymizer will use the PII with larger text.
  • Partial intersection - both will be returned concatenated.

Example of how each scenario would work. Our text will be:

My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: 03-232323.

  • No overlaps - only Inigo was recognized as NAME: My name is Montoya. You Killed my Father. Prepare to die. BTW my number is: 03-232323.
  • Full overlap - the number was recognized as PHONE_NUMBER with score of 0.7 and as SSN with score of 0.6, we will take the higher score: My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: < PHONE_NUMBER>
  • One PII is contained is another - Inigo was recognized as FIRST_NAME and Inigo Montoya was recognized as NAME, we will take the larger one: My name is . You Killed my Father. Prepare to die. BTW my number is: 03-232323.
  • Partial intersection - the number 03-2323 is recognized as a PHONE_NUMBER but 232323 is recognized as SSN: My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: < PHONE_NUMBER>.

Installation

As package:

To get started with Presidio-anonymizer, run the following:

pip install presidio-anonymizer

Getting started

As service:

In folder presidio/presidio-anonymizer run:

pipenv sync

Start the server with flask (this is a test server please do not use in prod):

pipenv run app.py

The request should be:

POST /anonymize

Payload:

{
    "text": "hello world, my name is Jane Doe. My number is: 034453334",
    "transformations": {
        "PHONE_NUMBER": {
            "type": "mask",
            "masking_char": "*",
            "chars_to_mask": 4,
            "from_end": true
        }
    },
    "analyzer_results": [
        {
            "start": 24,
            "end": 32,
            "score": 0.8,
            "entity_type": "NAME"
        },
        {
            "start": 24,
            "end": 28,
            "score": 0.8,
            "entity_type": "FIRST_NAME"
        },
        {
            "start": 29,
            "end": 32,
            "score": 0.6,
            "entity_type": "LAST_NAME"
        },
        {
            "start": 48,
            "end": 57,
            "score": 0.95,
            "entity_type": "PHONE_NUMBER"
        }
    ]
}

Result:

200 OK
hello world, my name is <NAME>. My number is: 03445****

HTTP API

/anonymizers

Returns a list of supported anonymizers.

Method: GET

No paramaters are required.

Response sample:

["fpe", "hash", "mask", "redact", "replace"]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

presidio_anonymizer-1.11.0-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file presidio_anonymizer-1.11.0-py3-none-any.whl.

File metadata

  • Download URL: presidio_anonymizer-1.11.0-py3-none-any.whl
  • Upload date:
  • Size: 14.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.7

File hashes

Hashes for presidio_anonymizer-1.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e0e5b876b704964e3a349c23f456aefdfe61e5686c5d2eb74769b978236d497b
MD5 a2b0d77e16743a88e78d7fa6d2755743
BLAKE2b-256 40d1473fe040fbc8366bfb841ad0d55c0ae8b005f1fdf82a3c592adeffcb6bdb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page