Skip to main content

ANJANA is an open source framework for applying different anonymity techniques.

Project description

ANJANA

License: Apache 2.0 Pipeline Status

Python version

Anonymity as major assurance of personal data privacy

ANJANA is a Python library for anonymizing sensitive data.

The following anonymity techniques are implemented, based on the Python library pyCANON:

  • k-anonymity.
  • (α,k)-anonymity.
  • ℓ-diversity.
  • Entropy ℓ-diversity.
  • Recursive (c,ℓ)-diversity.
  • t-closeness.
  • Basic β-likeness.
  • Enhanced β-likeness.
  • δ-disclosure privacy.

Getting started

For anonymizing your data you need to introduce:

  • The pandas dataframe with the data to be anonymized. Each column can contain: indentifiers, quasi-indentifiers or sensitive attributes.
  • The list with the names of the identifiers in the dataframe, in order to suppress them.
  • The list with the names of the quasi-identifiers in the dataframe.
  • The sentive attribute (only one) in case of applying other techniques than k-anonymity.
  • The level of anonymity to be applied, e.g. k (for k-anonymity), (for ℓ-diversity), t (for t-closeness), β (for basic or enhanced β-likeness), etc.
  • Maximum level of record suppression allowed (from 0 to 100).
  • Dictionary containing one dictionary for each quasi-identifier with the hierarchies and the levels.

Example: apply k-anonymity, ℓ-diversity and t-closeness to the adult dataset with some predefined hierarchies:

import pandas as pd
from anonymity import k_anonymity, l_diversity, t_closeness

# Read and process the data
data = pd.read_csv("adult.csv") 
data.columns = data.columns.str.strip()
cols = [
    "workclass",
    "education",
    "marital-status",
    "occupation",
    "sex",
    "native-country",
]
for col in cols:
    data[col] = data[col].str.strip()

# Define the identifiers, quasi-identifiers and the sensitive attribute
quasi_ident = [
    "age",
    "education",
    "marital-status",
    "occupation",
    "sex",
    "native-country",
]
ident = ["race"]
sens_att = "salary-class"

# Select the desired level of k, l and t
k = 10
l_div = 2
t = 0.5

# Select the suppression limit allowed
supp_level = 50

# Import the hierarquies for each quasi-identifier. Define a dictionary containing them
hierarchies = {
    "age": dict(pd.read_csv("hierarchies/age.csv", header=None)),
    "education": dict(pd.read_csv("hierarchies/education.csv", header=None)),
    "marital-status": dict(pd.read_csv("hierarchies/marital.csv", header=None)),
    "occupation": dict(pd.read_csv("hierarchies/occupation.csv", header=None)),
    "sex": dict(pd.read_csv("hierarchies/sex.csv", header=None)),
    "native-country": dict(pd.read_csv("hierarchies/country.csv", header=None)),
}

# Apply the three functions: k-anonymity, l-diversity and t-closeness
data_anon = k_anonymity(data, ident, quasi_ident, k, supp_level, hierarchies)
data_anon = l_diversity(
    data_anon, ident, quasi_ident, sens_att, k, l_div, supp_level, hierarchies
)
data_anon = t_closeness(
    data_anon, ident, quasi_ident, sens_att, k, t, supp_level, hierarchies
)

The previous code can be executed in less than 4 seconds for the more than 30,000 records of the original dataset.

License

This project is licensed under the Apache 2.0 license.

Project status

This project is under active development.

Funding and acknowledgments

This work is funded by European Union through the SIESTA project (Horizon Europe) under Grant number 101131957.


Note: Anjana and the mythology of Cantabria

"La Anjana" is a character from the mythology of Cantabria. Known as the good fairy of Cantabria, generous and protective of all people, she helps the poor, the suffering and those who stray in the forest.

- Partially extracted from: Cotera, Gustavo. Mitología de Cantabria. Ed. Tantin, Santander, 1998.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anjana-0.0.1.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

anjana-0.0.1-py3-none-any.whl (20.2 kB view details)

Uploaded Python 3

File details

Details for the file anjana-0.0.1.tar.gz.

File metadata

  • Download URL: anjana-0.0.1.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for anjana-0.0.1.tar.gz
Algorithm Hash digest
SHA256 f3def1804325e060c3ea569370e9f930d4bcca6e2b276a1ca4cae813e15cde03
MD5 a1677f89e8aa870b17427560402d1a3a
BLAKE2b-256 5a534000670873afcb4df3543af436be41839a38fba319a3f24d65958629fbc5

See more details on using hashes here.

File details

Details for the file anjana-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: anjana-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 20.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for anjana-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8a0bad41251c9ad188112ee18deaf4593845bc44e9ed0a6b4b628fa1fc16d594
MD5 02d3f2abb1c8ae66f757fded2c66c96d
BLAKE2b-256 29d132d8f868390d27e51ea958b12f55534fa95f3ec04077b3c94b00653b7b7f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page