ANJANA is an open source framework for applying different anonymity techniques.
Project description
ANJANA
Anonymity as major assurance of personal data privacy
ANJANA is a Python library for anonymizing sensitive data.
The following anonymity techniques are implemented, based on the Python library pyCANON:
- k-anonymity.
- (α,k)-anonymity.
- ℓ-diversity.
- Entropy ℓ-diversity.
- Recursive (c,ℓ)-diversity.
- t-closeness.
- Basic β-likeness.
- Enhanced β-likeness.
- δ-disclosure privacy.
Getting started
For anonymizing your data you need to introduce:
- The pandas dataframe with the data to be anonymized. Each column can contain: indentifiers, quasi-indentifiers or sensitive attributes.
- The list with the names of the identifiers in the dataframe, in order to suppress them.
- The list with the names of the quasi-identifiers in the dataframe.
- The sentive attribute (only one) in case of applying other techniques than k-anonymity.
- The level of anonymity to be applied, e.g. k (for k-anonymity), ℓ (for ℓ-diversity), t (for t-closeness), β (for basic or enhanced β-likeness), etc.
- Maximum level of record suppression allowed (from 0 to 100).
- Dictionary containing one dictionary for each quasi-identifier with the hierarchies and the levels.
Example: apply k-anonymity, ℓ-diversity and t-closeness to the adult dataset with some predefined hierarchies:
import pandas as pd
from anonymity import k_anonymity, l_diversity, t_closeness
# Read and process the data
data = pd.read_csv("adult.csv")
data.columns = data.columns.str.strip()
cols = [
"workclass",
"education",
"marital-status",
"occupation",
"sex",
"native-country",
]
for col in cols:
data[col] = data[col].str.strip()
# Define the identifiers, quasi-identifiers and the sensitive attribute
quasi_ident = [
"age",
"education",
"marital-status",
"occupation",
"sex",
"native-country",
]
ident = ["race"]
sens_att = "salary-class"
# Select the desired level of k, l and t
k = 10
l_div = 2
t = 0.5
# Select the suppression limit allowed
supp_level = 50
# Import the hierarquies for each quasi-identifier. Define a dictionary containing them
hierarchies = {
"age": dict(pd.read_csv("hierarchies/age.csv", header=None)),
"education": dict(pd.read_csv("hierarchies/education.csv", header=None)),
"marital-status": dict(pd.read_csv("hierarchies/marital.csv", header=None)),
"occupation": dict(pd.read_csv("hierarchies/occupation.csv", header=None)),
"sex": dict(pd.read_csv("hierarchies/sex.csv", header=None)),
"native-country": dict(pd.read_csv("hierarchies/country.csv", header=None)),
}
# Apply the three functions: k-anonymity, l-diversity and t-closeness
data_anon = k_anonymity(data, ident, quasi_ident, k, supp_level, hierarchies)
data_anon = l_diversity(
data_anon, ident, quasi_ident, sens_att, k, l_div, supp_level, hierarchies
)
data_anon = t_closeness(
data_anon, ident, quasi_ident, sens_att, k, t, supp_level, hierarchies
)
The previous code can be executed in less than 4 seconds for the more than 30,000 records of the original dataset.
License
This project is licensed under the Apache 2.0 license.
Project status
This project is under active development.
Funding and acknowledgments
This work is funded by European Union through the SIESTA project (Horizon Europe) under Grant number 101131957.
Note: Anjana and the mythology of Cantabria
"La Anjana" is a character from the mythology of Cantabria. Known as the good fairy of Cantabria, generous and protective of all people, she helps the poor, the suffering and those who stray in the forest.
- Partially extracted from: Cotera, Gustavo. Mitología de Cantabria. Ed. Tantin, Santander, 1998.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file anjana-0.0.1.tar.gz
.
File metadata
- Download URL: anjana-0.0.1.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3def1804325e060c3ea569370e9f930d4bcca6e2b276a1ca4cae813e15cde03 |
|
MD5 | a1677f89e8aa870b17427560402d1a3a |
|
BLAKE2b-256 | 5a534000670873afcb4df3543af436be41839a38fba319a3f24d65958629fbc5 |
File details
Details for the file anjana-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: anjana-0.0.1-py3-none-any.whl
- Upload date:
- Size: 20.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a0bad41251c9ad188112ee18deaf4593845bc44e9ed0a6b4b628fa1fc16d594 |
|
MD5 | 02d3f2abb1c8ae66f757fded2c66c96d |
|
BLAKE2b-256 | 29d132d8f868390d27e51ea958b12f55534fa95f3ec04077b3c94b00653b7b7f |