Useful functions, classes and tools for handling and interacting with dataframes.
Project description
Motivation
The dataframing
package provides useful functions to use dataframes.
Data transformation
The main goal is to allow you to transforme a dataframe structure into another in a way which is easy to use, understand how structures are connected and allows you to work with typing.
It assumes that you have a Protocol
defining your dataframes (which
by the way is a convenient thing to do!). For example:
>>> from typing import Protocol
>>> import dataframing as dfr
>>>
>>> class Original(Protocol):
... last_name: str
... first_name: str
>>>
>>> class Modified(Protocol):
... full_name: str
Now we build a transformer that connects Original
and Modified
>>> with dfr.morph(Original, Modified) as (ori2mod, source, target):
... target.full_name = dfr.wrap("{}, {}".format, source.last_name, source.first_name)
And now is ready to use!
>>> row = dict(last_name="Cleese", first_name="John")
>>> ori2mod.transform_record(row)
{'full_name': 'Cleese, John'}
Notice that we are demonstrating this with a dictionary but it will work this with a dataframe row, or a full dataframe (or iterable of dicts).
>>> data = [
... dict(last_name="Cleese", first_name="John"),
... dict(last_name="Gilliam", first_name="Terry")
... ]
>>> ori2mod.transform_collection(data)
[{'full_name': 'Cleese, John'}, {'full_name': 'Gilliam, Terry'}]
If you are going to use a particular function a lot, you can wrap it once and use it multiple times. This also helps to keep the converter visually clean.
>>> fullnamer = dfr.wrap("{}, {}".format)
>>> with dfr.morph(Original, Modified) as (ori2mod, source, target):
... target.full_name = fullnamer(source.last_name, source.first_name)
To show case how to create two columns from one, we are going to build the reverse transformer.
>>> def splitter(s: str) -> tuple[str, str]:
... part1, part2 = s.split(",")
... return part1.strip(), part2.strip()
>>> namesplitter = dfr.wrap(splitter)
>>> with dfr.morph(Modified, Original) as (mod2ori, source, target):
... target.last_name, target.first_name = namesplitter(source.full_name)
>>>
>>> row = dict(full_name="Cleese, John")
>>> mod2ori.transform_record(row)
{'last_name': 'Cleese', 'first_name': 'John'}
Input/Output
You can also use it to save and load data.
>>> dfr.save(my_dataframe, "example.xlsx") # doctest: +SKIP
>>> df = dfr.load("example.xlsx") # doctest: +SKIP
Why using this instead of the standard pandas.to_excel
?
save
does two extra things:
- Stores the metadata stored in
my_dataframe.attrs
from/into another sheet. - Calculates a hash for the data and metadata and store it in the metadata sheet.
Loads will compare the data content with the stored hash. This behaviour is
useful for data validation, but can be disable with use_hash
keyword argument.
Another useful pair of functions are load_many
, save_many
>>> dfr.save_many(dict(raw_data=raw_data, processed_data=processed_data), "example.xlsx") # doctest: +SKIP
>>> dfdict = dfr.load_many("example.xlsx") # doctest: +SKIP
in which the input and output are dictionaries that allows you to group into a single excel file multiple dataframes.
Installation
Just install it using:
pip install dataframing
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dataframing-0.1rc2.tar.gz
.
File metadata
- Download URL: dataframing-0.1rc2.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 338a99a20dea883420c838bfed2e1ebac663bdec426e255d95910a7d81e2f88d |
|
MD5 | 70242603e9392f92dcf572e64d9df6a4 |
|
BLAKE2b-256 | 3c52a6c879801147de1f3409afe3572e62d86b122f902814cdca481b30dc96c8 |
File details
Details for the file dataframing-0.1rc2-py3-none-any.whl
.
File metadata
- Download URL: dataframing-0.1rc2-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 493bd85eb3d33a50fa5b1bfb8b276c979a65444129527e81362a37bc40c63479 |
|
MD5 | 5add8a4d0a0c119ba745fe12976c119e |
|
BLAKE2b-256 | 33f80ab16a4f8ec2a21bfcf10fd2c806bcee6ebb83985be4c657c79c97858ca4 |