Skip to main content

Classes for representing different file formats in Python classes for use in type hinting in data workflows

Project description

https://github.com/arcanaframework/fileformats/actions/workflows/tests.yml/badge.svg https://codecov.io/gh/arcanaframework/fileformats/branch/main/graph/badge.svg?token=UIS0OGPST7 Supported Python versions Latest Version GitHub stars Documentation Status

Fileformats provides a library of file-format types implemented as Python classes. The file-format types are designed to be used in type validation during the construction of data workflows (e.g. Pydra, Fastr), and also provide some basic data handling methods (e.g. loading data to dictionaries) and conversions between some equivalent types.

Unlike other file-type Python packages, FileFormats, supports multi-file data formats (“file sets”) often found in scientific workflows, e.g. with separate header/data files, directories containing certain file types, and mechanisms to peek at metadata fields to define complex data formats or specific sub-types (e.g. functional MRI DICOM file set).

File-format types are typically identified by a combination of file extension and “magic numbers” where applicable. However, FileFormats provides a flexible framework to add custom identification routines for exotic file formats, e.g. formats that require inspection of headers to locate data files, directories containing certain file types. See the extension template for instructions on how to design FileFormats extensions modules to augment the standard file-types implemented in the main repository with custom domain/vendor-specific file-format types.

Installation

FileFormats can be installed for Python >= 3.7 from PyPI with

$ python3 -m pip fileformats

Support for converter methods between a few select formats can be installed by passing the ‘converters’ install extra, e.g

$ python3 -m pip install fileformats[converters]

Examples

Using the WithMagicNumber mixin class, the Png format can be defined concisely as

from fileformats.generic import File
from fileformats.core.mixin import WithMagicNumber

class Png(WithMagicNumber, File):
    binary = True
    ext = ".png"
    iana_mime = "image/png"
    magic_number = b".PNG"

Files can then be checked to see whether they are of PNG format by

png = Png("/path/to/image/file.png")  # Checks the extension (i.e. path-only validation)
png.validate()  # Checks the magic number (i.e. deeper file-contents validation)

which will raise a FormatMismatchError if initialisation or validation fails, or for a boolean method that checks the validation use matches

if Png.matches(a_path_to_a_file):
    ... handle case ...

There are a few selected converters between standard file-format types, perhaps most usefully between archive types and generic file/directories

from fileformats.archive import Zip
from fileformats.generic import Directory

zip_file = Zip.convert(Directory("/path/to/a/directory"))
extracted = Directory.convert(zip_file)
copied = extracted.copy_to("/path/to/output")

The converters are implemented in the Pydra dataflow framework, and can be linked into wider Pydra workflows by creating a converter task

import pydra
from pydra.tasks.mypackage import MyTask
from fileformats.serialization import Json, Yaml

wf = pydra.Workflow(name="a_workflow", input_spec=["in_json"])
wf.add(
    Yaml.get_converter(Json, name="json2yaml", in_file=wf.lzin.in_json)
)
wf.add(
    MyTask(
        name="my_task",
        in_file=wf.json2yaml.lzout.out_file,
    )
)
...

Alternatively, the conversion can be executed outside of a Pydra workflow with

json_file = Json("/path/to/file.json")
yaml_file = Yaml.convert(json_file)

License

This work is licensed under a Creative Commons Attribution 4.0 International License

Creative Commons Attribution 4.0 International License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fileformats-0.3.2.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

fileformats-0.3.2-py3-none-any.whl (44.7 kB view details)

Uploaded Python 3

File details

Details for the file fileformats-0.3.2.tar.gz.

File metadata

  • Download URL: fileformats-0.3.2.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for fileformats-0.3.2.tar.gz
Algorithm Hash digest
SHA256 df9cbefab1b5b858eb8959664439dacc30b0242208b6655a288b20af683bda05
MD5 2926e53f706aa50fe52d7d334fb169f1
BLAKE2b-256 792371612a35e0caec76724d8c68103958ee3dd56c7317524e203dad1a0077d5

See more details on using hashes here.

File details

Details for the file fileformats-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: fileformats-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 44.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for fileformats-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 68be88f79b8b0ccf96d6e7974fc5661aedace5f0a15db997028ce1dd2ded88db
MD5 f66ae0470c7f9bd58bbd82a3b8d37fcf
BLAKE2b-256 94d4827c7f9528134f42e809f43999c0a6bd4c732e74dfd07a1809a7767df4f0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page