Skip to main content

Useful functions, classes and tools for handling and interacting with dataframes.

Project description

Latest Version image License Python Versions CI LINTER Coverage

Motivation

The dataframing package provides useful functions to use dataframes.

Data transformation

The main goal is to allow you to transforme a dataframe structure into another in a way which is easy to use, understand how structures are connected and allows you to work with typing.

It assumes that you have a Protocol defining your dataframes (which by the way is a convenient thing to do!). For example:

>>> from typing import Protocol
>>> import dataframing as dfr
>>>
>>> class Original(Protocol):
...     last_name: str
...     first_name: str
>>>
>>> class Modified(Protocol):
...    full_name: str

Now we build a transformer that connects Original and Modified

>>> with dfr.morph(Original, Modified) as (ori2mod, source, target):
...    target.full_name = dfr.wrap("{}, {}".format, source.last_name, source.first_name)

And now is ready to use!

>>> row = dict(last_name="Cleese", first_name="John")
>>> ori2mod.transform_record(row)
{'full_name': 'Cleese, John'}

Notice that we are demonstrating this with a dictionary but it will work this with a dataframe row, or a full dataframe (or iterable of dicts).

>>> data = [
...   dict(last_name="Cleese", first_name="John"),
...   dict(last_name="Gilliam", first_name="Terry")
...   ]
>>> ori2mod.transform_collection(data)
[{'full_name': 'Cleese, John'}, {'full_name': 'Gilliam, Terry'}]

If you are going to use a particular function a lot, you can wrap it once and use it multiple times. This also helps to keep the converter visually clean.

>>> fullnamer = dfr.wrap("{}, {}".format)
>>> with dfr.morph(Original, Modified) as (ori2mod, source, target):
...    target.full_name = fullnamer(source.last_name, source.first_name)

To show case how to create two columns from one, we are going to build the reverse transformer.

>>> def splitter(s: str) -> tuple[str, str]:
...     part1, part2 = s.split(",")
...     return part1.strip(), part2.strip()
>>> namesplitter = dfr.wrap(splitter)
>>> with dfr.morph(Modified, Original) as (mod2ori, source, target):
...    target.last_name, target.first_name = namesplitter(source.full_name)
>>>
>>> row = dict(full_name="Cleese, John")
>>> mod2ori.transform_record(row)
{'last_name': 'Cleese', 'first_name': 'John'}

Input/Output

You can also use it to save and load data.

>>> dfr.save(my_dataframe, "example.xlsx") # doctest: +SKIP
>>> df = dfr.load("example.xlsx") # doctest: +SKIP

Why using this instead of the standard pandas.to_excel? save does two extra things:

  1. Stores the metadata stored in my_dataframe.attrs from/into another sheet.
  2. Calculates a hash for the data and metadata and store it in the metadata sheet.

Loads will compare the data content with the stored hash. This behaviour is useful for data validation, but can be disable with use_hash keyword argument.

Another useful pair of functions are load_many, save_many

>>> dfr.save_many(dict(raw_data=raw_data, processed_data=processed_data), "example.xlsx") # doctest: +SKIP
>>> dfdict = dfr.load_many("example.xlsx") # doctest: +SKIP

in which the input and output are dictionaries that allows you to group into a single excel file multiple dataframes.

Installation

Just install it using:

pip install dataframing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframing-0.1rc2.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

dataframing-0.1rc2-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file dataframing-0.1rc2.tar.gz.

File metadata

  • Download URL: dataframing-0.1rc2.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for dataframing-0.1rc2.tar.gz
Algorithm Hash digest
SHA256 338a99a20dea883420c838bfed2e1ebac663bdec426e255d95910a7d81e2f88d
MD5 70242603e9392f92dcf572e64d9df6a4
BLAKE2b-256 3c52a6c879801147de1f3409afe3572e62d86b122f902814cdca481b30dc96c8

See more details on using hashes here.

File details

Details for the file dataframing-0.1rc2-py3-none-any.whl.

File metadata

File hashes

Hashes for dataframing-0.1rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 493bd85eb3d33a50fa5b1bfb8b276c979a65444129527e81362a37bc40c63479
MD5 5add8a4d0a0c119ba745fe12976c119e
BLAKE2b-256 33f80ab16a4f8ec2a21bfcf10fd2c806bcee6ebb83985be4c657c79c97858ca4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page