Presidio structured package - analyzes and anonymizes structured and semi-structured data.
Project description
Presidio structured
Status
Alpha: This package is currently in alpha, meaning it is in its early stages of development. Features and functionality may change as the project evolves.
Description
The Presidio structured package is a flexible and customizable framework designed to identify and protect structured sensitive data. This tool extends the capabilities of Presidio, focusing on structured data formats such as tabular formats and semi-structured formats (JSON). It leverages the detection capabilities of Presidio-Analyzer to identify columns or keys containing personally identifiable information (PII), and establishes a mapping between these column/keys names and the detected PII entities. Following the detection, Presidio-Anonymizer is used to apply de-identification techniques to each value in columns identified as containing PII, ensuring the sensitive data is appropriately protected.
Installation
As a python package
To install the presidio-structured
package, run the following command:
pip install presidio-structured
Getting started
Anonymizing Data Frames:
import pandas as pd
from presidio_structured import StructuredEngine, PandasAnalysisBuilder
from presidio_anonymizer.entities import OperatorConfig
from faker import Faker # optionally using faker as an example
# Initialize the engine with a Pandas data processor (default)
pandas_engine = StructuredEngine()
# Create a sample DataFrame
sample_df = pd.DataFrame({'name': ['John Doe', 'Jane Smith'], 'email': ['john.doe@example.com', 'jane.smith@example.com']})
# Generate a tabular analysis which describes PII entities in the DataFrame.
tabular_analysis = PandasAnalysisBuilder().generate_analysis(sample_df)
# Define anonymization operators
fake = Faker()
operators = {
"PERSON": OperatorConfig("replace", {"new_value": "REDACTED"}),
"EMAIL_ADDRESS": OperatorConfig("custom", {"lambda": lambda x: fake.safe_email()})
}
# Anonymize DataFrame
anonymized_df = pandas_engine.anonymize(sample_df, tabular_analysis, operators=operators)
print(anonymized_df)
More information
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file presidio_structured-0.0.2a0-py3-none-any.whl
.
File metadata
- Download URL: presidio_structured-0.0.2a0-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54cbf024d153806d06b81940269a09c45ce6154492048deb5f1dfb2e780a2558 |
|
MD5 | 389d8ac200695f5e087144ced1c68e6e |
|
BLAKE2b-256 | 16fc3c70c7d177711584b6af497de1036c77199612f8ec11dc3a277a51a62dd1 |