Skip to main content

Classes for representing different file formats in Python classes for use in type hinting in data workflows

Project description

https://github.com/arcanaframework/fileformats/actions/workflows/tests.yml/badge.svg https://codecov.io/gh/arcanaframework/fileformats/branch/main/graph/badge.svg?token=UIS0OGPST7 Supported Python versions Latest Version GitHub stars Documentation Status

Fileformats provides a library of file-format types implemented as Python classes. The file-format types are designed to be used in type validation during the construction of data workflows (e.g. Pydra), and also provide some basic data handling methods (e.g. loading data to dictionaries) and conversions between some equivalent types.

Unlike other file-type Python packages, FileFormats, supports multi-file formats, e.g. with separate header/data files, nested directories, and mechanisms to peek at metadata fields to define complex data formats or specific sub-types (e.g. functional MRI DICOM file set)

File-format types are typically identified by a combination of file extension and “magic numbers” where applicable. However, FileFormats provides a flexible framework to add custom identification routines for exotic file formats, e.g. formats that require inspection of headers to locate data files, directories containing certain file types. See the extension template for instructions on how to design FileFormats extensions modules to augment the standard file-types implemented in the main repository with custom domain/vendor-specific file-format types.

Installation

FileFormats can be installed for Python >= 3.7 from PyPI with

$ python3 -m pip fileformats

Support for converter methods between a few select formats can be installed by passing the ‘converters’ install extra, e.g

$ python3 -m pip install fileformats[converters]

Examples

Using the WithMagicNumber mixin class, the Png format can be defined concisely as

from fileformats.generic import File
from fileformats.core.mixin import WithMagicNumber

class Png(File, WithMagicNumber):
    binary = True
    ext = ".png"
    iana_mime = "image/png"
    magic_number = b".PNG"

Files can then be checked to see whether they are of PNG format by

png = Png("/path/to/image/file.png")  # Checks the extension (i.e. path-only validation)
png.validate()  # Checks the magic number (i.e. deeper file-contents validation)

which will raise a FormatMismatchError if initialisation or validation fails, or for a boolean method that checks the validation use matches

if Png.matches(a_path_to_a_file):
    ... handle case ...

There are a few selected converters between standard file-format types, perhaps most usefully between archive types and generic file/directories

from fileformats.archive import Zip
from fileformats.generic import Directory

zip_file = Zip.convert(Directory("/path/to/a/directory"))
extracted = Directory.convert(zip_file)
copied = extracted.copy_to("/path/to/output")

The converters are implemented in the Pydra dataflow framework, and can be linked into wider Pydra workflows by creating a converter task

import pydra
from pydra.tasks.mypackage import MyTask
from fileformats.serialization import Json, Yaml

wf = pydra.Workflow(name="a_workflow", input_spec=["in_json"])
wf.add(
    Yaml.get_converter(Json, name="json2yaml", in_file=wf.lzin.in_json)
)
wf.add(
    MyTask(
        name="my_task",
        in_file=wf.json2yaml.lzout.out_file,
    )
)
...

Alternatively, the conversion can be executed outside of a Pydra workflow with

json_file = Json("/path/to/file.json")
yaml_file = Yaml.convert(json_file)

License

This work is licensed under a Creative Commons Attribution 4.0 International License

Creative Commons Attribution 4.0 International License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fileformats-0.3.1.tar.gz (28.1 kB view details)

Uploaded Source

Built Distribution

fileformats-0.3.1-py3-none-any.whl (41.7 kB view details)

Uploaded Python 3

File details

Details for the file fileformats-0.3.1.tar.gz.

File metadata

  • Download URL: fileformats-0.3.1.tar.gz
  • Upload date:
  • Size: 28.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for fileformats-0.3.1.tar.gz
Algorithm Hash digest
SHA256 372c90010c94067958433fea50e789b5a3283b9c4b5b3190325875a4e0f95c06
MD5 559044b1ff28f732fd93f481e26020ee
BLAKE2b-256 6e6c45ab60feeb4a9a26c1574f40ed927de83eb75d4db61b8ad16c1d2e7087f6

See more details on using hashes here.

File details

Details for the file fileformats-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: fileformats-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 41.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for fileformats-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 68a8898416237a4095b64d5ac01520c81fe51bb84f0abb215dc03f480f85d373
MD5 765f327d1829bb57a324e3c05dd6a28e
BLAKE2b-256 f1af3d6ff1e108e1834234f071fc3813cecc8249feb66f03d20339d9904bb087

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page