Skip to main content

Classes for representing different file formats in Python classes for use in type hinting in data workflows

Project description

https://github.com/arcanaframework/fileformats/actions/workflows/tests.yml/badge.svg https://codecov.io/gh/arcanaframework/fileformats/branch/main/graph/badge.svg?token=UIS0OGPST7 Supported Python versions Latest Version docs

Fileformats provides a library of file-format types implemented as Python classes. The file-format types are designed to be used in type validation during the construction of data workflows (e.g. Pydra, Fastr), and also provide some basic data handling methods (e.g. loading data to dictionaries) and conversions between some equivalent types When the “extended” install option is provided.

File-format types are typically identified by a combination of file extension and “magic numbers” where applicable, however, unlike many other file-type Python packages, FileFormats, supports multi-file data formats (“file sets”) often found in scientific workflows, e.g. with separate header/data files. FileFormats also provides a flexible framework to add custom identification routines for exotic file formats, e.g. formats that require inspection of headers to locate data files, directories containing certain file types, or to peek at metadata fields to define specific sub-types (e.g. functional MRI DICOM file set).

See the extension template for instructions on how to design FileFormats extensions modules to augment the standard file-types implemented in the main repository with custom domain/vendor-specific file-format types. Note that FileFormats is a new package, and only has limited support for standard formats at this stage, although the aim is to include all the official IANA MIME types (hopefully by scraping that site if anyone wants to have a go 😊).

Installation

FileFormats can be installed for Python >= 3.7 from PyPI with

$ python3 -m pip fileformats

Support for converter methods between a few select formats can be installed by passing the ‘extended’ install extra, e.g

$ python3 -m pip install fileformats[extended]

Examples

Using the WithMagicNumber mixin class, the Png format can be defined concisely as

from fileformats.generic import File
from fileformats.core.mixin import WithMagicNumber

class Png(WithMagicNumber, File):
    binary = True
    ext = ".png"
    iana_mime = "image/png"
    magic_number = b".PNG"

Files can then be checked to see whether they are of PNG format by

png = Png("/path/to/image/file.png")  # Checks the extension and magic number

which will raise a FormatMismatchError if initialisation or validation fails, or for a boolean method that checks the validation use matches

if Png.matches(a_path_to_a_file):
    ... handle case ...

There are a few selected converters between standard file-format types, perhaps most usefully between archive types and generic file/directories

from fileformats.archive import Zip
from fileformats.generic import Directory

zip_file = Zip.convert(Directory("/path/to/a/directory"))
extracted = Directory.convert(zip_file)
copied = extracted.copy_to("/path/to/output")

The converters are implemented in the Pydra dataflow framework, and can be linked into wider Pydra workflows by creating a converter task

import pydra
from pydra.tasks.mypackage import MyTask
from fileformats.serialization import Json, Yaml

wf = pydra.Workflow(name="a_workflow", input_spec=["in_json"])
wf.add(
    Yaml.get_converter(Json, name="json2yaml", in_file=wf.lzin.in_json)
)
wf.add(
    MyTask(
        name="my_task",
        in_file=wf.json2yaml.lzout.out_file,
    )
)
...

Alternatively, the conversion can be executed outside of a Pydra workflow with

json_file = Json("/path/to/file.json")
yaml_file = Yaml.convert(json_file)

License

This work is licensed under a Creative Commons Attribution 4.0 International License

Creative Commons Attribution 4.0 International License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fileformats-0.7.1.tar.gz (47.5 kB view details)

Uploaded Source

Built Distribution

fileformats-0.7.1-py3-none-any.whl (66.3 kB view details)

Uploaded Python 3

File details

Details for the file fileformats-0.7.1.tar.gz.

File metadata

  • Download URL: fileformats-0.7.1.tar.gz
  • Upload date:
  • Size: 47.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for fileformats-0.7.1.tar.gz
Algorithm Hash digest
SHA256 695f1d2dc22ed7c4aa6fc0f28be2ecfc958971d8e69b2651ca3ac760514d07b9
MD5 f5a1b7319223d6793a13f4bc7df380b9
BLAKE2b-256 eb3d75ff31242e046a2936eca200a67d970bf4c64a7d3f4c564c62bf3943e68e

See more details on using hashes here.

File details

Details for the file fileformats-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: fileformats-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 66.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for fileformats-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e3794cd639aa2905810077802e4a6f4fb0e3ac3759efd9630884be1d3ac42e21
MD5 193a03fabc9b9c50afcfa49b27f3556e
BLAKE2b-256 458871bc44f8220b26ff171952ff1cb9194d754089e6bac41e48fe34a404f7ee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page