from hansel import Crumb to find your file path.
Project description
hansel
Flexible parametric file paths to make queries, build folder trees and smart folder structure access.
Usage
Quick Intro
Imagine this folder tree:
data └── raw ├── 0040000 │ └── session_1 │ ├── anat_1 │ └── rest_1 ├── 0040001 │ └── session_1 │ ├── anat_1 │ └── rest_1 ├── 0040002 │ └── session_1 │ ├── anat_1 │ └── rest_1 ├── 0040003 │ └── session_1 │ ├── anat_1 │ └── rest_1 ├── 0040004 │ └── session_1 │ ├── anat_1 │ └── rest_1
from hansel import Crumb
# create the crumb
crumb = Crumb("{base_dir}/data/raw/{subject_id}/{session_id}/{image_type}/{image}")
# set the base_dir path
crumb = crumb.replace('base_dir', '/home/hansel')
assert str(crumb) == "/home/hansel/data/raw/{subject_id}/{session_id}/{image_type}"
# get the ids of the subjects
subj_ids = crumb['subject_id']
assert subj_ids == ['0040000', '0040001', '0040002', '0040003', '0040004', ....]
# get the paths to the subject folders, the output can be strings or crumbs, you choose with the make_crumbs boolean argument
subj_paths = crumb.ls('subject_id', make_crumbs=True)
# set the image_type
anat_crumb = crumb.replace(image_type='anat_1')
# get the paths to the anat_1 folders
anat_paths = anat_crumb.ls('image')
Long Intro
I often find myself in a work related with structured folder paths, such as the one shown above.
I have tried many ways of solving these situations: loops, dictionaries, configuration files, etc. I always end up doing a different thing for the same problem over and over again.
This week I grew tired of it and decided to make a representation of a structured folder tree in a string and access it the most easy way.
If you look at the folder structure above I have:
the root directory from where it is hanging: ...data/raw,
many identifiers (in this case a subject identification), e.g., 0040000,
session identification, session_1 and
a data type (in this case an image type), anat_1 and rest_1.
With hansel I can represent this folder structure like this:
from hansel import Crumb
crumb = Crumb("{base_dir}/data/raw/{subject_id}/{session_id}/{image_type}")
Let’s say we have the structure above hanging from a base directory like /home/hansel/.
I can use the replace function to make set the base_dir parameter:
crumb = crumb.replace('base_dir', '/home/hansel')
assert str(crumb) == "/home/hansel/data/raw/{subject_id}/{session_id}/{image_type}"
if you don’t need a copy of crumb, you can use the [] operator:
crumb['base_dir'] = '/home/hansel'
Now that the root path of my dataset is set, I can start querying my crumb path.
If I want to know the path to the existing subject_ids folders:
subject_paths = anat_crumb.ls('subject_id')
The output of ls can be str or Crumb or pathlib.Path. They will be Path if there are no crumb arguments left in the crumb path. You can choose this using the make_crumbs argument:
subject_paths = anat_crumb.ls('subject_id', make_crumbs=True)
If I want to know what are the existing subject_ids:
subject_ids = crumb.ls('subject_id', fullpath=False)
or
subject_ids = crumb['subject_id']
Now, if I wanted to get the path to all the anat_1 images, I could do this:
anat_crumb = crumb.replace(image_type='anat_1')
anat_paths = anat_crumb.ls('image')
or
crumb['image_type'] = 'anat_1'
anat_paths = crumb.ls('image')
More features
There are more possibilities such as:
creating folder trees with a value of maps for the crumbs:
from hansel import mktree, ParameterGrid
crumb = Crumb("/home/hansel/raw/{subject_id}/{session_id}/{modality}/{image}")
values_map = {'session_id': ['session_' + str(i) for i in range(2)],
'subject_id': ['subj_' + str(i) for i in range(3)]}
mktree(crumb, list(ParameterGrid(values_map)))
check the feasibility of a crumb path:
crumb = Crumb("/home/hansel/raw/{subject_id}/{session_id}/{modality}/{image}")
# ask if there is any subject with the image 'lollipop.png'.
crumb['image'] = 'lollipop.png'
assert crumb.exists()
check which subjects have ‘jujube.png’ and ‘toffee.png’ files:
crumb = Crumb("/home/hansel/raw/{subject_id}/{session_id}/{modality}/{image}")
toffee_crumb = crumb.replace(image='toffee.png')
jujube_crumb = crumb.replace(image='jujube.png')
# using sets functionality
set(toffee_crumb['subject_id']).intersection(set(jujube_crumb['subject_id']))
unfold the whole crumb path to get the whole filetree in a list of paths:
crumb = Crumb("/home/hansel/raw/{subject_id}/{session_id}/{modality}/{image}")
crumbs = crumb.unfold()
# and you can ask for the value of the crumb argument in each element
crumbs[0]['subject_id']
More functionalities, ideas and comments are welcome.
Dependencies
Please see the requirements.txt file. Before installing this package, install its dependencies with:
pip install -r requirements.txt
Install
I am only testing this tool on Python 3.4 and 3.5. Maybe it works on Python 2.7 too, having six and pathlib2 installed.
This package uses setuptools. You can install it running:
python setup.py install
If you already have the dependencies listed in requirements.txt installed, to install in your home directory, use:
python setup.py install –user
To install for all users on Unix/Linux:
python setup.py buildsudo python setup.py install
You can also install it in development mode with:
python setup.py develop
Development
Code
Github
You can check the latest sources with the command:
or if you have write privileges:
git clone git@github.com:alexsavio/hansel.git
If you are going to create patches for this project, create a branch for it from the master branch.
We tag stable releases in the repository with the version number.
Testing
We are using py.test to help us with the testing.
Otherwise you can run the tests executing:
python setup.py test
or
py.test
Changelog
Version 0.4.1
Fix CHANGES.rst to correct restview in PyPI
Thanks to restview: https://pypi-hypernode.com/pypi/restview
Version 0.4.0
Fill CHANGES.rst
All outputs from Crumb.ls function will be sorted.
Add regular expressions or fnmatch option for crumb arguments.
Change exists behaviour. Now the empty crumb arguments will return False when exist().
Code clean up.
Fix bugs
Version 0.3.1
Fix README
Code clean up.
Version 0.3.0
Add _argval member, a dict which stores crumb arguments replacements.
Add tests.
Remove rm_dups option in Crumb.ls function.
Remove conversion to Paths when Crumb has no crumb arguments in Crumb.ls.
Version 0.2.0
Add ignore_list parameter in Crumb constructor.
Version 0.1.1
Add Crumb.unfold function.
Move mktree out of Crumb class.
Version 0.1.0
Simplify code.
Increase test coverage.
Add exist_check to Crumb.ls function.
Fix bugs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hansel-0.4.1.tar.gz
.
File metadata
- Download URL: hansel-0.4.1.tar.gz
- Upload date:
- Size: 14.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab44600b4fb051b61c3772f5837f585024041a6070eac9d0ea782bf300796ecc |
|
MD5 | c687895b2142e6b3ffcdd6be6f4e2e99 |
|
BLAKE2b-256 | 9103b1a500d00e68ed06484240824e0c7a48f7403b58866ee84210a68770bc54 |
File details
Details for the file hansel-0.4.1-py3-none-any.whl
.
File metadata
- Download URL: hansel-0.4.1-py3-none-any.whl
- Upload date:
- Size: 19.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68e9c1cc3c789e2461b3a9926efa8247920440b0a46c14040323df67bc0d73e4 |
|
MD5 | dbbfed6bb170d8f0f5ebc8f4d5ad5fa6 |
|
BLAKE2b-256 | 3bc19df9fa1d3e3e709a15ee44fb5195420c8e419424b1dd32c1a10ba918e90e |