Skip to main content

Presidio structured package - analyzes and anonymizes structured and semi-structured data.

Project description

Presidio structured

Status

Alpha: This package is currently in alpha, meaning it is in its early stages of development. Features and functionality may change as the project evolves.

Description

The Presidio structured package is a flexible and customizable framework designed to identify and protect structured sensitive data. This tool extends the capabilities of Presidio, focusing on structured data formats such as tabular formats and semi-structured formats (JSON). It leverages the detection capabilities of Presidio-Analyzer to identify columns or keys containing personally identifiable information (PII), and establishes a mapping between these column/keys names and the detected PII entities. Following the detection, Presidio-Anonymizer is used to apply de-identification techniques to each value in columns identified as containing PII, ensuring the sensitive data is appropriately protected.

Installation

As a python package

To install the presidio-structured package, run the following command:

pip install presidio-structured

Getting started

Anonymizing Data Frames:

import pandas as pd
from presidio_structured import StructuredEngine, PandasAnalysisBuilder
from presidio_anonymizer.entities import OperatorConfig
from faker import Faker # optionally using faker as an example

# Initialize the engine with a Pandas data processor (default)
pandas_engine = StructuredEngine()

# Create a sample DataFrame
sample_df = pd.DataFrame({'name': ['John Doe', 'Jane Smith'], 'email': ['john.doe@example.com', 'jane.smith@example.com']})

# Generate a tabular analysis which describes PII entities in the DataFrame.
tabular_analysis = PandasAnalysisBuilder().generate_analysis(sample_df)

# Define anonymization operators
fake = Faker()
operators = {
    "PERSON": OperatorConfig("replace", {"new_value": "REDACTED"}),
    "EMAIL_ADDRESS": OperatorConfig("custom", {"lambda": lambda x: fake.safe_email()})
}

# Anonymize DataFrame
anonymized_df = pandas_engine.anonymize(sample_df, tabular_analysis, operators=operators)
print(anonymized_df)

More information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

presidio_structured-0.0.2a0-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file presidio_structured-0.0.2a0-py3-none-any.whl.

File metadata

File hashes

Hashes for presidio_structured-0.0.2a0-py3-none-any.whl
Algorithm Hash digest
SHA256 54cbf024d153806d06b81940269a09c45ce6154492048deb5f1dfb2e780a2558
MD5 389d8ac200695f5e087144ced1c68e6e
BLAKE2b-256 16fc3c70c7d177711584b6af497de1036c77199612f8ec11dc3a277a51a62dd1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page