data-path-utils

Management of scripts that produce/consume data with specific labels

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3 :: Only
- Python :: 3.6
Topic
- Software Development

Project description

Overview

Over the past few years, I’ve organically standardized on a structure for the code I write for my research. I’ve preferred to have each step of an analysis pipeline implemented as a standalone script, though usually with functions and classes that are importable in other modules – such scripts often load some data, perform some processing, save that processed data, save plots/figures, etc.

This package provides utilities for creating and finding labeled paths, which are suitable for storing data and plots. It’s often important to be able to compare results between different versions of some analysis step, so these paths are timestamped to prevent repeated runs of a script from overwriting previous results.

This package differentiates between “data” paths, to save things which might be loaded by another script at a further stage of an analysis pipeline, and “output” paths, for plots/etc. which are only intended for people to examine.

The main interface to this code is through the complementary functions create_data_path and find_newest_data_path, which each take a single “label” string argument and return a pathlib.Path. These can be used as follows:

input_path = find_newest_data_path('previous_script')
with open(input_path / 'filename') as f:
    data = load(f)

processed_data = do_something_with(data)

data_path = create_data_path('name_of_this_script')
with open(data_path / 'whatever_filename', 'w') as f:
    save(processed_data, f)

Output paths are likewise created by create_output_path. It is recommended that scripts which call create_data_path use the name of the script as the “label” argument, but this is not enforced – one can include parameter values or anything else relevant.

Additional functionality

With these calls to create_data_path and find_newest_data_path, one can then model a set of such scripts as a directed graph, with nodes representing both scripts and data paths, and edges denoting a “requires” relationship, e.g. “script X requires data label Y, which is produced by script Z”. This package also contains standalone scripts (which require the package NetworkX) that parse the Python files in a certain project, construct this graph, and use this graph to provide other useful functionality in the form of three standalone executable scripts:

dependency_graph: Plots this graph, using the pydotplus package, and a call to the dot GraphViz executable.
list_script_dependencies: takes a script filename as a command-line argument, and produces an ordered list of the data/script dependencies of that script by performing a topological sort on the subset of the graph reachable from that script. Useful for answering questions like “what should I run, in what order, to have everything in place to run this script of interest?” Note that this requires that the subgraph reachable from a script node be acyclic (which it should be anyway).
archive_script_data_dependencies: takes a script filename as a command-line argument, and identifies all data dependencies of that script. Archives all files under those data paths to a zip file which can easily be transported between machines.

Requirements

Python 3.6 or newer.

Things listed under “Additional functionality” require NetworkX and pydotplus.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3 :: Only
- Python :: 3.6
Topic
- Software Development

Release history Release notifications | RSS feed

0.8.1

Jul 24, 2018

0.8

Jul 24, 2018

0.7

Feb 2, 2018

0.6

Sep 6, 2017

0.5

Aug 10, 2017

This version

0.4

Aug 10, 2017

0.3

Aug 2, 2017

0.2

Aug 1, 2017

0.1

Jul 31, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data-path-utils-0.4.tar.gz (7.5 kB view details)

Uploaded Aug 10, 2017 Source

Built Distribution

data_path_utils-0.4-py3-none-any.whl (12.1 kB view details)

Uploaded Aug 10, 2017 Python 3

File details

Details for the file data-path-utils-0.4.tar.gz.

File metadata

Download URL: data-path-utils-0.4.tar.gz
Upload date: Aug 10, 2017
Size: 7.5 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for data-path-utils-0.4.tar.gz
Algorithm	Hash digest
SHA256	`b3f8a376d67226680348eed5f09a19269512ca362305a1a55ad7df29b6ecab62`
MD5	`2d5aba5638e4b4bb3b63867b3ffbd5b8`
BLAKE2b-256	`5306d4fc2eec94b93d5549fed7990e2b7a5bfea006aa73fa2073a2b2bd3fa2a8`

See more details on using hashes here.

File details

Details for the file data_path_utils-0.4-py3-none-any.whl.

File metadata

Download URL: data_path_utils-0.4-py3-none-any.whl
Upload date: Aug 10, 2017
Size: 12.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for data_path_utils-0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f59cd77d8e060554db555ce8f47dbce7895edae50ca2daa486802ea89830500`
MD5	`8d41b5c68fdad6a7899ca4bcd4232500`
BLAKE2b-256	`039b3b526ce4f700e8fbd76d59f5a4b21eb15f7b71f2e960d51ee094d6878760`