Skip to main content

Git-annex special remote implementation for (remote) indexed archives

Project description

Git-annex special remote for accessing (Remote) Indexed file Archives (RIA)

Travis tests status GitHub release PyPI version fury.io

This git-annex special remote implementation is very similar to the directory special remote type built into git-annex. There are a few key differences that outline the use cases where one might consider using this one instead:

  • (Optional) read-access to (compressed) 7z archives

    (Parts of) the keys stored in the remote can live in a 7z archive. These archives are indexed and support relatively fast random read access. This feature can be instrumental on HPC storage systems where strong quotas on filesystem inodes might be imposed on users. The entire key store of the remote can be put into an archive, re-using the exact same directory structure, and remains fully accessible while only using a handful of inodes, regardless of file number and size.

  • (SSH-based remote) access to a configurable directory

    An SSH host name can be provided and all interaction with the remote will be performed via SSH. Moving from local to remote operations, or switching target paths can be done via a change to the configuration (even without having to touch a repository at all). This makes it easier to accommodate infrastructural changes, especially when dealing with large numbers of repositories.

  • Multi-repository directory structure

    While each repository has its own associated key store directory tree, the key store directories of multiple repositories can be organized into a homogeneous archive directory structure. For DataLad datasets, their ID is used to define the location of a key store in an archive. For any other repository the annex remote UUID is taken. This feature further aids the handling of large numbers of repositories in a backup or data store use case, because locations are derived from repository properties rather than having to re-configure them explicitly.

Installation

Before you install this package, please make sure that you install a recent version of git-annex. This special remote requires at minimum git-annex version 6.20160511. Afterwards, install the latest version of ria-remote from PyPi:

# install from PyPi
pip install ria-remote

Use

A ria special remote is set up like any other "external"-type remote via the git-annex initremote command. There is a single additional required setting in contrast to the standard ones: base-path which determines the base directory where the special remote places its keys:

git annex initremote myremote \
    type=external encryption=none \
    externaltype=ria base-path=/tmp/basepath/here

Alternatively, the base-path can also be provided via a Git configuration variable by setting annex.ria-remote.<remote>.base-path (in this example annex.ria-remote.myremote.base-path).

The remote is now ready for use. Any directories will be created on demand. The key store for a repository will be located underneath the given base path, in a structure like this:

/tmp/basepath/here
└── 2e5
    └── 24934-a09e-11e9-8503-f0d5bf7b5561
        └── annex
            └── objects
                └── ff4
                    └── c57
                        └── MD5E-s4--ba1f2511fc30423bdbb183fe33f3dd0f
                            └── MD5E-s4--ba1f2511fc30423bdbb183fe33f3dd0f

where the first two levels represent a tree structure that can host key stores for any number of repositories, and the remaining level are identical to the organization of a bare Git repository with the annex object tree following the layout of a directory-type git-annex special remote. The directory names for the two top-most levels are build from the git-annex UUID for the special remote, or a DataLad dataset UUID, if available.

The special remote also supports SSH-based operation. To enable it, an additional host name argument has to be given:

git annex initremote myremote \
    type=external encryption=none \
    externaltype=ria base-path=/tmp/basepath/here \
    ssh-host=ria.example.com

This configuration will make the special remote use /tmp/basepath/here on ria.example.com. Any SSH-access customizations (user name, ports, etc.) have to be implemented via the standard SSH configuration mechanism, for example, by placing a snippet like this in $HOME/.ssh/config:

Host ria.example.com
  User mike
  Port 2222
  PreferredAuthentications publickey

Support

All bugs, concerns and enhancement requests for this software can be submitted here: https://github.com/datalad/git-annex-ria-remote/issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ria_remote-0.4.tar.gz (33.6 kB view details)

Uploaded Source

Built Distribution

ria_remote-0.4-py2.py3-none-any.whl (19.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file ria_remote-0.4.tar.gz.

File metadata

  • Download URL: ria_remote-0.4.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for ria_remote-0.4.tar.gz
Algorithm Hash digest
SHA256 df30ca1b2b2d1ce9faa83e235215a05cbc978563a54c3d169eec667e52aa68cb
MD5 38904aabb1a6c7baacc584a5cf6d87ec
BLAKE2b-256 c0d20bca147dd6ca3780b2afb51c6251242798418e5f275d9894955e3d6d6bf9

See more details on using hashes here.

File details

Details for the file ria_remote-0.4-py2.py3-none-any.whl.

File metadata

  • Download URL: ria_remote-0.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 19.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for ria_remote-0.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0c1562644ac367a892efab502122a9053b4acd2c4b6258ad7e462db41134912f
MD5 3099a507cd068d88c151cb7925cc0123
BLAKE2b-256 269d21210b9e7b0f4c27bb99558302ef4013fab05ea0c6d842cbdc06289bc875

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page