from hansel import Crumb to find your file path.
Project description
hansel
Parametric file paths to access and build structured folder trees.
It almost doesn’t have Dependencies, check how to Install it.
Github repository: https://github.com/alexsavio/hansel
Usage
Quick Intro
Imagine this folder tree:
data └── raw ├── 0040000 │ └── session_1 │ ├── anat_1 │ └── rest_1 ├── 0040001 │ └── session_1 │ ├── anat_1 │ └── rest_1 ├── 0040002 │ └── session_1 │ ├── anat_1 │ └── rest_1 ├── 0040003 │ └── session_1 │ ├── anat_1 │ └── rest_1 ├── 0040004 │ └── session_1 │ ├── anat_1 │ └── rest_1
>>> from hansel import Crumb
# create the crumb
>>> crumb = Crumb("{base_dir}/data/raw/{subject_id}/{session_id}/{image_type}/{image}")
# set the base_dir path
>>> crumb = crumb.replace(base_dir='/home/hansel')
>>> print(str(crumb))
/home/hansel/data/raw/{subject_id}/{session_id}/{image_type}
# get the ids of the subjects
>>> subj_ids = crumb['subject_id']
>>> print(subj_ids)
['0040000', '0040001', '0040002', '0040003', '0040004', '0040005', ...
# get the paths to the subject folders, the output can be strings or crumbs,
# you choose with the ``make_crumbs`` boolean argument. Default: True.
>>> subj_paths = crumb.ls('subject_id', make_crumbs=True)
>>> print(subj_paths)
[Crumb("/home/hansel/data/raw/0040000/{session_id}/{image_type}/{image}"),
Crumb("/home/hansel/data/raw/0040001/{session_id}/{image_type}/{image}"),
Crumb("/home/hansel/data/raw/0040002/{session_id}/{image_type}/{image}"),
Crumb("/home/hansel/data/raw/0040003/{session_id}/{image_type}/{image}"),
Crumb("/home/hansel/data/raw/0040004/{session_id}/{image_type}/{image}"),
...
# set the image_type
>>> anat_crumb = crumb.replace(image_type='anat_1')
>>> print(anat_crumb)
/home/hansel/data/raw/{subject_id}/{session_id}/anat_1/{image}
# get the paths to the images inside the anat_1 folders
>>> anat_paths = anat_crumb.ls('image')
>>> print(anat_paths)
[Crumb("/home/hansel/data/raw/0040000/session_1/anat_1/mprage.nii.gz"),
Crumb("/home/hansel/data/raw/0040001/session_1/anat_1/mprage.nii.gz"),
Crumb("/home/hansel/data/raw/0040002/session_1/anat_1/mprage.nii.gz"),
Crumb("/home/hansel/data/raw/0040003/session_1/anat_1/mprage.nii.gz"),
Crumb("/home/hansel/data/raw/0040004/session_1/anat_1/mprage.nii.gz"),
...
# get the ``session_id`` of each of these ``anat_paths``
>>> sessions = [cr['session_id'][0] for cr in anat_paths]
>>> print(sessions)
['session_1', 'session_1', 'session_1', 'session_1', 'session_1', ...
# if you don't want the the output to be ``Crumbs`` but string paths:
>>> anat_paths = anat_crumb.ls('image', make_crumbs=False)
>>> print(anat_paths)
["/home/hansel/data/raw/0040000/session_1/anat_1/mprage.nii.gz",
"/home/hansel/data/raw/0040001/session_1/anat_1/mprage.nii.gz",
"/home/hansel/data/raw/0040002/session_1/anat_1/mprage.nii.gz",
"/home/hansel/data/raw/0040003/session_1/anat_1/mprage.nii.gz",
"/home/hansel/data/raw/0040004/session_1/anat_1/mprage.nii.gz",
...
# you can also use a list of ``fnmatch`` expressions to ignore certain files patterns
# using the ``ignore_list`` argument in the constructor.
# For example, the files that start with '.'.
>>> crumb = Crumb("{base_dir}/data/raw/{subject_id}/{session_id}/{image_type}/{image}",
>>> ignore_list=['.*'])
See more quick examples after the Long Intro check More features and tricks.
Long Intro
I often find myself in a work related with structured folder paths, such as the one shown above.
I have tried many ways of solving these situations: loops, dictionaries, configuration files, etc. I always end up doing a different thing for the same problem over and over again.
This week I grew tired of it and decided to make a representation of a structured folder tree in a string and access it the most easy way.
If you look at the folder structure above I have:
the root directory from where it is hanging: ...data/raw,
many identifiers (in this case a subject identification), e.g., 0040000,
session identification, session_1 and
a data type (in this case an image type), anat_1 and rest_1.
With hansel I can represent this folder structure like this:
>>> from hansel import Crumb
>>> crumb = Crumb("{base_dir}/data/raw/{subject_id}/{session_id}/{image_type}/{image}")
Let’s say we have the structure above hanging from a base directory like /home/hansel/.
I can use the replace function to make set the base_dir parameter:
>>> crumb = crumb.replace(base_dir='/home/hansel')
>>> print(str(crumb))
/home/hansel/data/raw/{subject_id}/{session_id}/{image_type}
if I don’t need a copy of crumb, I can use the [] operator:
>>> crumb['base_dir'] = '/home/hansel'
>>> print(str(crumb))
/home/hansel/data/raw/{subject_id}/{session_id}/{image_type}
Now that the root path of my dataset is set, I can start querying my crumb path.
If I want to know the path to the existing subject_id folders:
We can use the ls function. Its output can be str or Crumb. I can choose this using the make_crumbs argument (default: True):
>>> subj_crumbs = crumb.ls('subject_id')
>>> print(subj_crumbs)
[Crumb("/home/hansel/data/raw/0040000/{session_id}/{image_type}/{image}"),
Crumb("/home/hansel/data/raw/0040001/{session_id}/{image_type}/{image}"),
Crumb("/home/hansel/data/raw/0040002/{session_id}/{image_type}/{image}"),
Crumb("/home/hansel/data/raw/0040003/{session_id}/{image_type}/{image}"),
Crumb("/home/hansel/data/raw/0040004/{session_id}/{image_type}/{image}"),
...
>>> subj_paths = anat_crumb.ls('subject_id', make_crumbs=False)
>>> print(subj_paths)
["/home/hansel/data/raw/0040000/{session_id}/{image_type}/{image}",
"/home/hansel/data/raw/0040001/{session_id}/{image_type}/{image}",
"/home/hansel/data/raw/0040002/{session_id}/{image_type}/{image}",
"/home/hansel/data/raw/0040003/{session_id}/{image_type}/{image}",
"/home/hansel/data/raw/0040004/{session_id}/{image_type}/{image}",
...
If I want to know what are the existing subject_id:
>>> subj_ids = crumb.ls('subject_id', fullpath=False)
>>> print(subj_ids)
['0040000', '0040001', '0040002', '0040003', '0040004', '0040005', ...
or
>>> subj_ids = crumb['subject_id']
>>> print(subj_ids)
['0040000', '0040001', '0040002', '0040003', '0040004', '0040005', ...
Now, if I wanted to get the path to all the images inside the anat_1 folders, I could do this:
>>> anat_crumb = crumb.replace(image_type='anat_1')
>>> print(anat_crumb)
/home/hansel/data/raw/{subject_id}/{session_id}/anat_1/{image}
or if I don’t need to keep a copy of crumb:
>>> crumb['image_type'] = 'anat_1'
# get the paths to the images inside the anat_1 folders
>>> anat_paths = crumb.ls('image')
>>> print(anat_paths)
[Crumb("/home/hansel/data/raw/0040000/session_1/anat_1/mprage.nii.gz"),
Crumb("/home/hansel/data/raw/0040001/session_1/anat_1/mprage.nii.gz"),
Crumb("/home/hansel/data/raw/0040002/session_1/anat_1/mprage.nii.gz"),
Crumb("/home/hansel/data/raw/0040003/session_1/anat_1/mprage.nii.gz"),
Crumb("/home/hansel/data/raw/0040004/session_1/anat_1/mprage.nii.gz"),
...
Remember that I can still access the replaced crumb arguments in each of the previous crumbs in anat_paths.
>>> subj_ids = [cr['subject_id'][0] for cr in anat_paths]
>>> print(subj_ids)
['0040000', '0040001', '0040002', '0040003', '0040004', '0040005', ...
>>> files = [cr['image'][0] for cr in anat_paths]
>>> print(files)
['mprage.nii.gz', 'mprage.nii.gz', 'mprage.nii.gz', 'mprage.nii.gz', ...
More features and tricks
There are more possibilities such as:
Creating folder trees
Use mktree and ParameterGrid to create a tree of folders.
>>> from hansel import mktree, ParameterGrid >>> crumb = Crumb("/home/hansel/raw/{subject_id}/{session_id}/{modality}/{image}") >>> values_map = {'session_id': ['session_' + str(i) for i in range(2)], >>> 'subject_id': ['subj_' + str(i) for i in range(3)]} >>> mktree(crumb, list(ParameterGrid(values_map)))
Check the feasibility of a crumb path
>>> crumb = Crumb("/home/hansel/raw/{subject_id}/{session_id}/{modality}/{image}") # ask if there is any subject with the image 'lollipop.png'. >>> crumb['image'] = 'lollipop.png' >>> assert crumb.exists()
Check which subjects have ‘jujube.png’ and ‘toffee.png’ files
>>> crumb = Crumb("/home/hansel/raw/{subject_id}/{session_id}/{modality}/{image}") >>> toffee_crumb = crumb.replace(image='toffee.png') >>> jujube_crumb = crumb.replace(image='jujube.png') # using sets functionality >>> gluttons = set(toffee_crumb['subject_id']).intersection(set(jujube_crumb['subject_id']) >>> print(gluttons) ['gretel', 'hansel']
Use the intersection function
Use it for comparisons on more than one crumb argument. This can be used to compare datasets with the same structure in different folders.
One argument
Imagine that we have two working folders of subjects for two different projects: proj1 and proj2. If I want to check what subjects are common to both projects:
>>> from hansel import intersection # using one argument >>> cr_proj1 = Crumb("/home/hansel/proj1/{subject_id}/{session_id}/{modality}/{image}") >>> cr_proj2 = Crumb("/home/hansel/proj2/{subject_id}/{session_id}/{modality}/{image}") # set the `on` argument in `intersection` to specify which crumb arguments to merge. >>> merged = intersection(cr_proj1, cr_proj2, on=['subject_id']) >>> print(merged) [(('subject_id', '0040000'),), (('subject_id', '0040001'),), (('subject_id', '0040001'),)] # I can pick these subject crumbs from this result using the `build_paths` function. >>> cr1.build_paths(merged, make_crumbs=True) [Crumb("/home/hansel/proj1/0040010/{session}/{mod}/{image}"), Crumb("/home/hansel/proj1/0040110/{session}/{mod}/{image}")] >>> cr2.build_paths(merged, make_crumbs=True) [Crumb("/home/hansel/proj2/0040010/{session}/{mod}/{image}"), Crumb("/home/hansel/proj2/0040110/{session}/{mod}/{image}")]
Two arguments
Now, imagine that I have different sets of {image} for these subjects. I want to check what of those subjects have exactly the same images. Let’s say that the subject 0040001 has a anatomical.nii.gz instead of mprage.nii.gz.
>>> from hansel import intersection # using one argument >>> cr_proj1 = Crumb("/home/hansel/proj1/{subject_id}/{session_id}/{modality}/{image}") >>> cr_proj2 = Crumb("/home/hansel/proj2/{subject_id}/{session_id}/{modality}/{image}") # set the `on` argument in `intersection` to specify which crumb arguments to merge. >>> merged = intersection(cr_proj1, cr_proj2, on=['subject_id', 'image']) >>> print(merged) [(('subject_id', '0040000'), ('image', 'mprage.nii.gz')), (('subject_id', '0040000'), ('image', 'rest.nii.gz')), (('subject_id', '0040001'), ('image', 'rest.nii.gz')), (('subject_id', '0040002'), ('image', 'mprage.nii.gz')), (('subject_id', '0040002'), ('image', 'rest.nii.gz'))] # I can pick these image crumbs from this result using the `build_paths` function. >>> cr1.build_paths(merged, make_crumbs=True) [Crumb("/home/hansel/proj1/0040000/{session}/{mod}/mprage.nii.gz"), Crumb("/home/hansel/proj1/0040000/{session}/{mod}/rest.nii.gz"), Crumb("/home/hansel/proj1/0040001/{session}/{mod}/rest.nii.gz"), Crumb("/home/hansel/proj1/0040002/{session}/{mod}/mprage.nii.gz"), Crumb("/home/hansel/proj1/0040002/{session}/{mod}/rest.nii.gz")] >>> cr2.build_paths(merged, make_crumbs=True) [Crumb("/home/alexandre/data/cobre/proj2/0040000/{session}/{mod}/mprage.nii.gz"), Crumb("/home/alexandre/data/cobre/proj2/0040000/{session}/{mod}/rest.nii.gz"), Crumb("/home/alexandre/data/cobre/proj2/0040001/{session}/{mod}/rest.nii.gz"), Crumb("/home/alexandre/data/cobre/proj2/0040002/{session}/{mod}/mprage.nii.gz"), Crumb("/home/alexandre/data/cobre/proj2/0040002/{session}/{mod}/rest.nii.gz")] # adding 'mod' to the intersection would be: >>> intersection(cr1, cr2, on=['subject_id', 'mod', 'image']) [(('subject_id', '0040000'), ('mod', 'anat_1'), ('image', 'mprage.nii.gz')), (('subject_id', '0040000'), ('mod', 'rest_1'), ('image', 'rest.nii.gz')), (('subject_id', '0040001'), ('mod', 'rest_1'), ('image', 'rest.nii.gz')), (('subject_id', '0040002'), ('mod', 'anat_1'), ('image', 'mprage.nii.gz')), (('subject_id', '0040002'), ('mod', 'rest_1'), ('image', 'rest.nii.gz'))]
The unfold function
Unfold the whole crumb path to get the whole file tree in a list of paths:
>>> all_images = Crumb("/home/hansel/raw/{subject_id}/{session_id}/{modality}/{image}") >>> all_images = crumb.unfold() >>> print(all_images) [Crumb("/home/hansel/data/raw/0040000/session_1/anat_1/mprage.nii.gz"), Crumb("/home/hansel/data/raw/0040000/session_1/rest_1/rest.nii.gz"), Crumb("/home/hansel/data/raw/0040001/session_1/anat_1/mprage.nii.gz"), Crumb("/home/hansel/data/raw/0040001/session_1/rest_1/rest.nii.gz"), Crumb("/home/hansel/data/raw/0040002/session_1/anat_1/mprage.nii.gz"), Crumb("/home/hansel/data/raw/0040002/session_1/rest_1/rest.nii.gz"), Crumb("/home/hansel/data/raw/0040003/session_1/anat_1/mprage.nii.gz"), Crumb("/home/hansel/data/raw/0040003/session_1/rest_1/rest.nii.gz"), ... # and you can ask for the value of the crumb argument in each element >>> print(crumbs[0]['subject_id']) ['0040000']
Note that unfold is the same as calling ls function without arguments.
Use regular expressions
Use re.match or fnmatch expressions to filter the paths:
The syntax for crumb arguments with a regular expression is: "{<arg_name>:<arg_regex>}"
# only the session_0 folders >>> s0_imgs = Crumb("/home/hansel/raw/{subject_id}/{session_id:*_0}/{modality}/{image}") >>> s0_imgs = crumb.unfold() >>> print(s0_imgs) [Crumb("/home/hansel/data/raw/0040000/session_0/anat_1/mprage.nii.gz"), Crumb("/home/hansel/data/raw/0040000/session_0/rest_1/rest.nii.gz"), Crumb("/home/hansel/data/raw/0040001/session_0/anat_1/mprage.nii.gz"), Crumb("/home/hansel/data/raw/0040001/session_0/rest_1/rest.nii.gz"), ...
The default is for fnmatch expressions. If you prefer using re.match for filtering, set the regex argument to 're' or 're.ignorecase' in the constructor.
# only the ``session_0`` folders >>> s0_imgs = Crumb("/home/hansel/raw/{subject_id}/{session_id:^.*_0$}/{modality}/{image}", >>> regex='re') >>> s0_imgs = crumb.unfold() >>> print(s0_imgs) [Crumb("/home/hansel/data/raw/0040000/session_0/anat_1/mprage.nii.gz"), Crumb("/home/hansel/data/raw/0040000/session_0/rest_1/rest.nii.gz"), Crumb("/home/hansel/data/raw/0040001/session_0/anat_1/mprage.nii.gz"), Crumb("/home/hansel/data/raw/0040001/session_0/rest_1/rest.nii.gz"), ...
The regular expressions can be checked with the patterns property.
>>> print(s0_imgs.patterns) {'session_id': '^.*_0$', 'modality': '', 'image': '', 'subject_id': ''}
And can be also modified with the set_pattern function.
>>> s0_imgs.set_pattern('modality', 'a.*') >>> print(s0_imgs.patterns) {'session_id': '^.*_0$', 'modality': 'a.*', 'image': '', 'subject_id': ''} >>> print(s0_imgs.path) /home/hansel/raw/{subject_id}/{session_id:^.*_0$}/{modality:a.*}/{image}
More functionalities, ideas and comments are welcome.
Dependencies
Please see the requirements.txt file. Before installing this package, install its dependencies with:
pip install -r requirements.txt
Install
It works on Python 3.4, 3.5 and 2.7. For Python 2.7 install pathlib2 as well.
This package uses setuptools. You can install it running:
python setup.py install
If you already have the dependencies listed in requirements.txt installed, to install in your home directory, use:
python setup.py install --user
To install for all users on Unix/Linux:
python setup.py build sudo python setup.py install
You can also install it in development mode with:
python setup.py develop
Development
Code
Github
You can check the latest sources with the command:
git clone https://www.github.com/alexsavio/hansel.git
or if you have write privileges:
git clone git@github.com:alexsavio/hansel.git
If you are going to create patches for this project, create a branch for it from the master branch.
We tag stable releases in the repository with the version number.
Testing
We are using py.test to help us with the testing.
Otherwise you can run the tests executing:
python setup.py test
or
py.test
or
make test
Changelog
Version 0.8.0 -
Set to True the default value for check_exists in Crumb.ls function. I don’t think anybody is interested in non-existing paths.
Now it is possible to set a non-open item in a Crumb, i.e., I can replace the value for an already set crumb argument.
Update README.rst
Code clean-up.
Replace dict to OrderedDict output in valuesmap_to_dict function.
Version 0.7.0 - 0.7.5
Refactoring of how Crumb works, now using string.Formatter. This will help with new features due to simpler logic.Now it is not possible to change the syntax of the Crumbs, although I guess nobody is interested in that.
Fixed a few bugs from previous versions.
Now copy function is not a classmethod anymore, so you can do crumb.copy() as well as Crumb.copy(crumb).
patterns is not a dictionary anymore, the regexes are embedded in the _path string. The property patterns returns the dictionary as before. The function set_pattern must be used instead to set a different pattern to a given argument.
Update README.rst
Fix README.rst because of bad syntax for PyPI.
Fix bug for Python 2.7
Fix the bug in .rst for PyPI.
Code cleanup
Version 0.6.0 - 0.6.2
Added intersection function in utils.py.
Change of behaviour in __getitem__, now it returns a list of values even if is only the one replace string from _argval.
General renaming of the private functions inside Crumbs, more in accordance to the open_args/all_args idea.
Fixed a few bugs and now the generated crumbs from unfold and ls will have the same parameters as the original Crumb.
Change the behaviour or intersection with len(arg_names) == 1 for compatibility with crumb.build_path function.
Improve README, update with new examples using intersection.
Add pandas helper functions.
Add utils to convert from values_maps to dicts.
Improve docstrings.
Version 0.5.0 - 0.5.5
Add Python 2.7 compatibility. Friends don’t let friends use Python 2.7!
Add ‘re.ignorecase’ option for the regex argument in the constructor.
Add utils.check_path function.
Fix Crumb.split function to return the not defined part of the crumb.
Add Crumbs.keys() function.
Rename utils.remove_duplicates() to utils.rm_dups().
Deprecating Crumbs.keys() function.
Renamed Crumbs.keys() to Crumbs.open_args() and added Crumbs.all_args().
Substitute the internal logic of Crumbs to work with Crumbs.open_args(), made it a bit faster.
Added CHANGES.rst to MANIFEST.in
Version 0.4.0 - 0.4.2
Fill CHANGES.rst.
All outputs from Crumb.ls function will be sorted.
Add regular expressions or fnmatch option for crumb arguments.
Change exists behaviour. Now the empty crumb arguments will return False when exist().
Code clean up.
Fix bugs.
Fix CHANGES.rst to correct restview in PyPI.
Thanks to restview: https://pypi-hypernode.com/pypi/restview. Use: restview --long-description
Improve documentation in README.
Rename member _argreg to patterns, so the user can use it to manage the argument patterns.
Version 0.3.0 - 0.3.1
Add _argval member, a dict which stores crumb arguments replacements.
Add tests.
Remove rm_dups option in Crumb.ls function.
Remove conversion to Paths when Crumb has no crumb arguments in Crumb.ls.
Fix README.
Code clean up.
Version 0.2.0
Add ignore_list parameter in Crumb constructor.
Version 0.1.0 - 0.1.1
Simplify code.
Increase test coverage.
Add exist_check to Crumb.ls function.
Fix bugs.
Add Crumb.unfold function.
Move mktree out of Crumb class.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hansel-0.8.1.tar.gz
.
File metadata
- Download URL: hansel-0.8.1.tar.gz
- Upload date:
- Size: 31.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d29eb301ab870a8124c9f5b76f0b86eb8780f2b9b601c7b10e1f4b551590cffa |
|
MD5 | 76067bc0715838630d38c08b6f807da0 |
|
BLAKE2b-256 | ada8b9dc0f080a0f08468012bd26d806da79a7b1932166c664d59b6f9a63b241 |
File details
Details for the file hansel-0.8.1-py3-none-any.whl
.
File metadata
- Download URL: hansel-0.8.1-py3-none-any.whl
- Upload date:
- Size: 30.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66852918c5ea051153626933b2c318896a95409778fa83ebc77e058673d75007 |
|
MD5 | 6f86aa49119805d5210a1cd218088ee4 |
|
BLAKE2b-256 | 9a3bdfceb4a65d640226201a527820cdd0dc7480c9b2000c4d74986e7389840b |