Add a short description here!
Project description
- Scientists and engineer create a multitude of digital artefacts during their daily work:
experimental results,
simulation results,
literate programming notebooks analysing experiments and simulations
statistical models,
machine learning models,
figures,
tables, etc
In order to trace and track these multiple interconnected research artefacts, hierarchical naming schemes are a powerful tool to document the connection between research artefacts, find previous research outputs, and enable reproducible research [1].
The following naming scheme has evolved over several years to track research artefacts of all kinds.
The general scheme is: DSyymd[hMM]e[_x]__title
DS: Project specific initials. Here, DS stands for Data Science.
yy: [0-9][0-9] are the last two digits of the years in the 21st century. I won’t live beyond that. So, I do not care for following centuries.
m: [o-z] these letters map to the respective months.
d: [1-9,A-V] represent the 31 days of a month. Digits and upper-case characters have approximately the same height, such that this element gives a visual structure to the name, which divides the date from the daily counter.
h: [a-x] these optional letters refer to the hours of the day
MM: [0-5][0-9] are the minutes with 00 encoding the full hour
e: [a-z]+ daily counter as lower-case letter enumerating the respective database or dataset. The 28th dataset would start with be enumerated as aa.
x: Optional attribute being the last significant characters of the dataset, from which DSyymde is derived.
title: Readable name of the respective data set with whitespaces being replaced by underscores.
m |
month |
d |
day |
d |
day |
d |
day |
hour |
h |
hour |
h |
---|---|---|---|---|---|---|---|---|---|---|---|
o |
January |
1 |
1 |
B |
11 |
L |
21 |
0 |
a |
12 |
m |
p |
February |
2 |
2 |
C |
12 |
M |
22 |
1 |
b |
13 |
n |
q |
March |
3 |
3 |
D |
13 |
N |
23 |
2 |
c |
14 |
o |
r |
April |
4 |
4 |
E |
14 |
O |
24 |
3 |
d |
15 |
p |
s |
May |
5 |
5 |
F |
15 |
P |
25 |
4 |
e |
16 |
q |
t |
June |
6 |
6 |
G |
16 |
Q |
26 |
5 |
f |
17 |
r |
u |
July |
7 |
7 |
H |
17 |
R |
27 |
6 |
g |
18 |
s |
v |
August |
8 |
8 |
I |
18 |
S |
28 |
7 |
h |
19 |
t |
w |
September |
9 |
9 |
J |
19 |
T |
29 |
8 |
i |
20 |
u |
x |
October |
A |
10 |
K |
20 |
U |
30 |
9 |
j |
21 |
v |
y |
November |
V |
31 |
10 |
k |
22 |
w |
||||
z |
December |
11 |
l |
23 |
x |
The first dataset created on Friday 01.01.2021 would be named DS21o1a.
The second dataset created on the same day would be named DS21o1b.
An analysis (e.g. Jupyter notebook) of the first data set started after the second data set had been created would be named DS21o1c_a. Exported figures of this analysis should be named DS21o1c_a__[plottype].[filetype].
An analysis of data set DS21o1b started on 2nd January should be named DS21o2a_1b.
An meta analysis of DS21o1c_a and DS21o2a_1b started on 11th February should be named DS21pBa_o1c_2a.
Installation
The module contexere can be installed from PyPi:
pip install contexere
Usage
The project provides the command line tool name:
usage: name [-h] [--version] [-n] [-p PROJECT] [-s] [-t] [-v] [-vv] [path] Suggest name for research artefact positional arguments: path Path to folder with research artefacts (default: current working dir) options: -h, --help show this help message and exit --version show program's version number and exit -n, --next Suggest next artefact name -p PROJECT, --project PROJECT Specify project abbreviation -s, --summary Sumarize files following the naming convention -t, --time add time abbreviation -v, --verbose set loglevel to INFO -vv, --very-verbose set loglevel to DEBUG
Calling the tool without any arguments returns the date abbreviation of today:
name 24xV
Adding the option --time also abbreviates the actual time:
name --time 24xVj36
In an empty folder, a name file is suggested by specifying a project abbreviation and the --next option:
mkdir test_folder cd test_folder name --project DS --next DS24xVa
After the creation of the first file, only the --next option needs to be provided to get another file name suggestion:
touch DS24xVa__test.dat name --next DS24xVb
References
[1] Martin Kühne and Andreas W. Liehr. Improving the traditional information management in natural sciences. Data Science Journal, 8(1):18–26, 2009. doi: 10.2481/dsj.8.18.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file contexere-0.4.tar.gz
.
File metadata
- Download URL: contexere-0.4.tar.gz
- Upload date:
- Size: 26.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0177d6d2f615027669c5f1647d217b807f3453a123e91e17026cffa4ca0b5e84 |
|
MD5 | 0566cac87ec0d06be3a46b816e6deabf |
|
BLAKE2b-256 | fbd7eebcd522f1e129c206b63081d65926362f9210021e897e5f61f94b1c9dab |
File details
Details for the file contexere-0.4-py3-none-any.whl
.
File metadata
- Download URL: contexere-0.4-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d564d47b3dfba51046cb40215a0d9dab209a7cbbf5624b2df123ceb5eaf87f0 |
|
MD5 | 627dd44fb7d73210ed6762bd19ef82a2 |
|
BLAKE2b-256 | 46ed616a9e8723941aad4f8a55664bddc13a3b5493d937cf52d0cfc93232bf79 |