Classes for representing different file formats in Python classes for use in type hinting in data workflows
Project description
Fileformats provides a library of file-format types implemented as Python classes. The file-format types are designed to be used in type validation during the construction of data workflows (e.g. Pydra, Fastr), and also provide some basic data handling methods (e.g. loading data to dictionaries) and conversions between some equivalent types.
Unlike other file-type Python packages, FileFormats, supports multi-file data formats (“file sets”) often found in scientific workflows, e.g. with separate header/data files, directories containing certain file types, and mechanisms to peek at metadata fields to define complex data formats or specific sub-types (e.g. functional MRI DICOM file set).
File-format types are typically identified by a combination of file extension and “magic numbers” where applicable. However, FileFormats provides a flexible framework to add custom identification routines for exotic file formats, e.g. formats that require inspection of headers to locate data files, directories containing certain file types. See the extension template for instructions on how to design FileFormats extensions modules to augment the standard file-types implemented in the main repository with custom domain/vendor-specific file-format types.
Installation
FileFormats can be installed for Python >= 3.7 from PyPI with
$ python3 -m pip fileformats
Support for converter methods between a few select formats can be installed by passing the ‘converters’ install extra, e.g
$ python3 -m pip install fileformats[converters]
Examples
Using the WithMagicNumber mixin class, the Png format can be defined concisely as
from fileformats.generic import File
from fileformats.core.mixin import WithMagicNumber
class Png(WithMagicNumber, File):
binary = True
ext = ".png"
iana_mime = "image/png"
magic_number = b".PNG"
Files can then be checked to see whether they are of PNG format by
png = Png("/path/to/image/file.png") # Checks the extension (i.e. path-only validation)
png.validate() # Checks the magic number (i.e. deeper file-contents validation)
which will raise a FormatMismatchError if initialisation or validation fails, or for a boolean method that checks the validation use matches
if Png.matches(a_path_to_a_file):
... handle case ...
There are a few selected converters between standard file-format types, perhaps most usefully between archive types and generic file/directories
from fileformats.archive import Zip
from fileformats.generic import Directory
zip_file = Zip.convert(Directory("/path/to/a/directory"))
extracted = Directory.convert(zip_file)
copied = extracted.copy_to("/path/to/output")
The converters are implemented in the Pydra dataflow framework, and can be linked into wider Pydra workflows by creating a converter task
import pydra
from pydra.tasks.mypackage import MyTask
from fileformats.serialization import Json, Yaml
wf = pydra.Workflow(name="a_workflow", input_spec=["in_json"])
wf.add(
Yaml.get_converter(Json, name="json2yaml", in_file=wf.lzin.in_json)
)
wf.add(
MyTask(
name="my_task",
in_file=wf.json2yaml.lzout.out_file,
)
)
...
Alternatively, the conversion can be executed outside of a Pydra workflow with
json_file = Json("/path/to/file.json")
yaml_file = Yaml.convert(json_file)
License
This work is licensed under a Creative Commons Attribution 4.0 International License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fileformats-0.3.2.tar.gz
.
File metadata
- Download URL: fileformats-0.3.2.tar.gz
- Upload date:
- Size: 29.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | df9cbefab1b5b858eb8959664439dacc30b0242208b6655a288b20af683bda05 |
|
MD5 | 2926e53f706aa50fe52d7d334fb169f1 |
|
BLAKE2b-256 | 792371612a35e0caec76724d8c68103958ee3dd56c7317524e203dad1a0077d5 |
File details
Details for the file fileformats-0.3.2-py3-none-any.whl
.
File metadata
- Download URL: fileformats-0.3.2-py3-none-any.whl
- Upload date:
- Size: 44.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68be88f79b8b0ccf96d6e7974fc5661aedace5f0a15db997028ce1dd2ded88db |
|
MD5 | f66ae0470c7f9bd58bbd82a3b8d37fcf |
|
BLAKE2b-256 | 94d4827c7f9528134f42e809f43999c0a6bd4c732e74dfd07a1809a7767df4f0 |