A Git LFS Server implementation in Python with support for pluggable backends
Project description
Giftless - a Pluggable Git LFS Server
Giftless a Python implementation of a Git LFS Server. It is designed with flexibility in mind, to allow pluggable storage backends, transfer methods and authentication methods.
Giftless supports the basic Git LFS transfer mode with the following storage backends:
- Local storage
- Google Cloud Storage
- Azure Blob Storage with direct-to-cloud or streamed transfers
Additional transfer modes and storage backends could easily be added and configured.
Installation & Quick Start
Running using Docker
Giftless is available as a Docker image. You can simply use:
$ docker run --rm -p 5000:5000 datopian/giftless
To pull and run Giftless on a system that supports Docker.
This will run the server in WSGI mode, which will require an HTTP server such as nginx to proxy HTTP requests to it.
Alternatively, you can specify the following command line arguments to have uWSGI run in HTTP mode, if no complex HTTP setup is required:
$ docker run --rm -p 8080:8080 datopian/giftless \
-M -T --threads 2 -p 2 --manage-script-name --callable app \
--http 0.0.0.0:8080
If you need to, you can also build the Docker image locally as described below.
Installing & Running from Pypi
You can install Giftless into your Python environment of choice (3.7+) using pip:
(venv) $ pip install giftless
To run it, you most likely are going to need a WSGI server installed such as uWSGI or Gunicorn. Here is an example of how to run Giftless locally with uWSGI:
# Install uWSGI or any other WSGI server
$ (.venv) pip install uwsgi
# Run uWSGI (see uWSGI's manual for help on all arguments)
$ (.venv) uwsgi -M -T --threads 2 -p 2 --manage-script-name \
--module giftless.wsgi_entrypoint --callable app --http 127.0.0.1:8080
Installing & Running from Source
You can install and run giftless
from source:
$ git clone https://github.com/datopian/giftless.git
# Initialize a virtual environment
$ cd giftless
$ python3 -m venv .venv
$ . .venv/bin/activate
$ (.venv) pip install -r requirements.txt
You can then proceed to run Giftless with a WSGI server as described above.
Note that for non-production use you may avoid using a WSGI server and rely on Flask's built in development server. This should never be done in a production environment:
$ (.venv) ./flask-develop.sh
The default generated endpoint is http://127.0.0.1:5000/. Note: If you access this endpoint, you should receive an error message (invalid route).
Running a local example
-
Create a new project on Github or any other platform. Here, we create a project named
example-proj-datahub-io
. -
Add any data file to it. The goal is to track this possible large file with git-lfs and use Giftless as the local server. In our example, we create a CSV named
research_data_factors.csv
. -
Create a file named
giftless.yaml
in your project root directory with the following content in order to have a local server:
TRANSFER_ADAPTERS:
basic:
factory: giftless.transfer.basic_streaming:factory
options:
storage_class: giftless.storage.local_storage:LocalStorage
AUTH_PROVIDERS:
- giftless.auth.allow_anon:read_write
- Export it:
$ export GIFTLESS_CONFIG_FILE=giftless.yaml
-
Start the Giftless server (by docker or Python).
-
Initialize your git repo and connect it with the remote project:
git init
git remote add origin YOUR_REMOTE_REPO
- Track files with git-lfs:
git lfs track 'research_data_factors.csv'
git lfs track
git add .gitattributes #you should have a .gitattributes file at this point
git add "research_data_factors.csv"
git commit -m "Tracking data files"
- You can see a list of tracked files with
git lfs ls-files
- Configure
lfs.url
to point to your local Giftless server instance:
git config -f .lfsconfig lfs.url http://127.0.0.1:5000/<user_or_org>/<repo>/
# in our case, we used http://127.0.0.1:5000/datopian/example-proj-datahub-io/;
# make sure to end your lfs.url with /
- The previous configuration will produce changes into
.lfsconfig
file. Add it to git:
git add .lfsconfig
git commit -m "New git-lfs server endpoint"
# if you don't see any changes, run git rm --cached *.csv and then re-add your files, then commit it
git lfs push origin master
Configuration
It is also possible to configure Giftless' YAML file to use an external storage.
Azure Support
Modify your giftless.yaml
file according to the following config:
$ cat giftless.yaml
TRANSFER_ADAPTERS:
basic:
factory: giftless.transfer.basic_external:factory
options:
storage_class: ..storage.azure:AzureBlobsStorage
storage_options:
connection_string: GetYourAzureConnectionStringAndPutItHere==
container_name: lfs-storage
path_prefix: large-files
Google Cloud Platform Support
To use Google Cloud Storage as a backend, you'll first need:
- A Google Cloud Storage bucket to store objects in
- an account key JSON file (see here).
The key must be associated with either a user or a service account, and should have read / write permissions on objects in the bucket.
If you plan to access objects from a browser, your bucket needs to have CORS enabled.
You can deploy the account key JSON file and provide the path to it as
the account_key_file
storage option:
TRANSFER_ADAPTERS:
basic:
factory: giftless.transfer.basic_streaming:factory
options:
storage_class: giftless.storage.google_cloud:GoogleCloudStorage
storage_options:
project_name: my-gcp-project
bucket_name: git-lfs
account_key_file: /path/to/credentials.json
Alternatively, you can base64-encode the contents of the JSON file and provide
it inline as account_key_base64
:
TRANSFER_ADAPTERS:
basic:
factory: giftless.transfer.basic_streaming:factory
options:
storage_class: giftless.storage.google_cloud:GoogleCloudStorage
storage_options:
project_name: my-gcp-project
bucket_name: git-lfs
account_key_base64: S0m3B4se64RandomStuff.....ThatI5Redac7edHeReF0rRead4b1lity==
After configuring your giftless.yaml
file, export it:
$ export GIFTLESS_CONFIG_FILE=giftless.yaml
You will need uWSGI running. Install it with your preferred package manager. Here is an example of how to run it:
# Run uWSGI in HTTP mode on port 8080
$ uwsgi -M -T --threads 2 -p 2 --manage-script-name \
--module giftless.wsgi_entrypoint --callable app --http 127.0.0.1:8080
See giftless/config.py
for some default configuration options.
Configuration using .env files
[WIP] It is possible to use an .env
file instead of a YAML file in case you
need to deploy the project in a platform which does not support deploying
configuration in files, such as Heroku.
At this time, we only support a raw format where we dump the content of
giftless.yaml
into an env var anmed YAML_CONTENT
:
GIFTLESS_CONFIG_STR="TRANSFER_ADAPTERS:
basic:
factory: giftless.transfer.basic_streaming:factory
options:
storage_class: ..storage.google_cloud:GoogleCloudStorage
storage_options:
bucket_name: datahub-bbb
api_key: API_KEY
AUTH_PROVIDERS:
- giftless.auth.allow_anon:read_write
"
Note #1: As YAML is a superset of JSON, you can also provide a more compact JSON string instead.
Note #2:: If you provide both a YAML file (as GIFTLESS_CONFIG_FILE
) and a
literal YAML string (as GIFTLESS_CONFIG_STR
), the two will be merged, with values
from the YAML string taking precedence over values from the YAML file.
Transfer Adapters
Git LFS servers and clients can implement and negotiate different [transfer adapters]
(https://github.com/git-lfs/git-lfs/blob/master/docs/api/basic-transfers.md). Typically,
Git LFS will only define a basic
transfer mode and support that. basic
is simple
and efficient for direct-to-storage uploads for backends that support uploading using
a single PUT
request.
To support more complex, and especially multi-part uploads (uploads done using more
than one HTTP request, each with a different part of a large file) directly to backends
that support that, Giftless adds support for a non-standard multipart-basic
transfer
mode. Note that this can only work with specific backends that support this type of
functionality.
Enabling Multipart Transfer Mode
You can enable multipart transfers by adding the following lines to your Giftless config file:
TRANSFER_ADAPTERS:
# Add the following lines:
multipart-basic:
factory: giftless.transfer.multipart:factory
options:
storage_class: giftless.storage.azure:AzureBlobsStorage
storage_options:
connection_string: "somesecretconnectionstringhere"
container_name: my-multipart-storage
You must specify a storage_class
that supports multipart transfers (implements the MultipartStorage
interface). Currently, these are:
giftless.storage.azure:AzureBlobsStorage
- Azure Blob Storage
The following additional options are available for multipart-basic
transfer adapter:
action_lifetime
- The maximal lifetime in seconds for signed multipart actions; Because multipart uploads tend to be of very large files and can easily take hours to complete, we recommend setting this to a few hours; The default is 6 hours.max_part_size
- Maximal length in bytes of a single part upload. The default is 10MB.
See the specific storage adapter for additional backend-specific configuration options to be added under
storage_options
.
Authenticators
TBD
Pre-Authorized Action Authenticators
TBD
Using Arbitrary WSGI Middleware
TBD
Fixing Generated URLs when Running Behind a Proxy
You can use the ProxyFix
Werkzeug middleware to fix issues caused when
Giftless runs behind a reverse proxy, causing generated URLs to not match
the URLs expected by clients:
MIDDLEWARE:
- class: werkzeug.middleware.proxy_fix:ProxyFix
kwargs:
x_host: 1
x_port: 1
x_prefix: 1
In order for this to work, you must ensure your reverse proxy (e.g. nginx)
sets the right X-Forwarded-*
headers when passing requests.
For example, if you have deployed giftless in an endpoint that is available to
clients at https://example.com/lfs
, the following nginx configuration is
expected, in addition to the Giftless configuration set in the MIDDLEWARE
section:
location /lfs/ {
proxy_pass http://giftless.internal.host:5000/;
proxy_set_header X-Forwarded-Prefix /lfs;
}
This example assumes Giftless is available to the reverse proxy at
giftless.internal.host
port 5000. In addition, X-Forwarded-Host
,
X-Forwarded-Port
, X-Forwarded-Proto
are automatically set by nginx by
default.
Adding CORS Support
There are a number of CORS WSGI middleware implementations available on PyPI, and you can use any of them to add CORS headers control support to Giftless.
For example, you can enable CORS support using wsgi-cors-middleware:
(.venv) $ pip install wsgi_cors_middleware
And then add the following to your config file:
MIDDLEWARE:
- class: wsgi_cors_middleware:CorsMiddleware
kwargs:
origin: https://www.example.com
headers: ['Content-type', 'Accept', 'Authorization']
methods: ['GET', 'POST', 'PUT']
Overview of the Git LFS workflow
Development
giftless
is based on Flask, with the following additional libraries:
- Flask Classful for simplifying API endpoint implementation with Flask
- Marshmallow for input / output serialization and validation
- figcan for configuration handling
You must have Python 3.7 and newer set up to run or develop giftless
.
Code Style
We use the following tools and standards to write giftless
code:
flake8
to check your Python code for PEP8 complianceimport
statements are checked byisort
and should be organized accordingly- Type checking is done using
mypy
Maximum line length is set to 120 characters.
Setting up a Virtual Environment
You should develop giftless
in a virtual environment. We use pip-tools
to manage both development and runtime dependencies.
The following snippet is an example of how to set up your virtual environment for development:
$ python3 -m venv .venv
$ . .venv/bin/activate
(.venv) $ pip install -r dev-requirements.txt
(.venv) $ pip-sync dev-requirements.txt requirements.txt
Running tests
Once in a virtual environment, you can simply run make test
to run all tests
and code style checks:
$ make test
We use pytest
for Python unit testsing.
In addition, simple functions can specify some doctest
style tests in the
function docstring. These tests will be tested automatically when unit tests
are executed.
Building a Docker image
Simply run make docker
to build a uWSGI
wrapped Docker image for giftless
.
The image will be named datopian/giftless:latest
by default. You can change
it, for example:
$ make docker DOCKER_REPO=mycompany DOCKER_IMAGE_TAG=1.2.3
Will build a Docekr image tagged mycompany/giftless:1.2.3
.
License
Copyright (C) 2020, Viderum, Inc.
Giftless is free / open source software and is distributed under the terms of the MIT license. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file giftless-0.3.0.tar.gz
.
File metadata
- Download URL: giftless-0.3.0.tar.gz
- Upload date:
- Size: 45.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80bcdc9527a06846579cbded134f4eb11ae754c679846187bfd582ed4adee7dd |
|
MD5 | 99dd6291ef5867cf72abe73d882ce161 |
|
BLAKE2b-256 | b4c667e964dac94d2fc6fd9900c5952eb1d9be9dde652ea14a28d3e26a4621f2 |
File details
Details for the file giftless-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: giftless-0.3.0-py3-none-any.whl
- Upload date:
- Size: 55.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e02705ddce4ca225f38709af1a0d2dcc618edb016d44d8461e19400291077f3f |
|
MD5 | e9cc2a0dbabb42e7b5e4b8f609ac23a5 |
|
BLAKE2b-256 | 7d44fda6222eff00d79d6742c988fc82cac6c074c864ebb9b829ffad0716fb83 |