Skip to main content

A Git LFS Server implementation in Python with support for pluggable backends

Project description

Giftless - a Pluggable Git LFS Server

Build Status Maintainability Test Coverage

Giftless a Python implementation of a Git LFS Server. It is designed with flexibility in mind, to allow pluggable storage backends, transfer methods and authentication methods.

Giftless supports the basic Git LFS transfer mode with the following storage backends:

Additional transfer modes and storage backends could easily be added and configured.

Installation & Quick Start

Running using Docker

Giftless is available as a Docker image. You can simply use:

$ docker run --rm -p 5000:5000 datopian/giftless

To pull and run Giftless on a system that supports Docker.

This will run the server in WSGI mode, which will require an HTTP server such as nginx to proxy HTTP requests to it.

Alternatively, you can specify the following command line arguments to have uWSGI run in HTTP mode, if no complex HTTP setup is required:

$ docker run --rm -p 8080:8080 datopian/giftless \
    -M -T --threads 2 -p 2 --manage-script-name --callable app \
    --http 0.0.0.0:8080

If you need to, you can also build the Docker image locally as described below.

Installing & Running from Pypi

You can install Giftless into your Python environment of choice (3.7+) using pip:

(venv) $ pip install giftless

To run it, you most likely are going to need a WSGI server installed such as uWSGI or Gunicorn. Here is an example of how to run Giftless locally with uWSGI:

# Install uWSGI or any other WSGI server
$ (.venv) pip install uwsgi

# Run uWSGI (see uWSGI's manual for help on all arguments)
$ (.venv)  uwsgi -M -T --threads 2 -p 2 --manage-script-name \
    --module giftless.wsgi_entrypoint --callable app --http 127.0.0.1:8080

Installing & Running from Source

You can install and run giftless from source:

$ git clone https://github.com/datopian/giftless.git

# Initialize a virtual environment
$ cd giftless
$ python3 -m venv .venv
$ . .venv/bin/activate
$ (.venv) pip install -r requirements.txt

You can then proceed to run Giftless with a WSGI server as described above.

Note that for non-production use you may avoid using a WSGI server and rely on Flask's built in development server. This should never be done in a production environment:

$ (.venv) ./flask-develop.sh

The default generated endpoint is http://127.0.0.1:5000/. Note: If you access this endpoint, you should receive an error message (invalid route).

Running a local example

  1. Create a new project on Github or any other platform. Here, we create a project named example-proj-datahub-io.

  2. Add any data file to it. The goal is to track this possible large file with git-lfs and use Giftless as the local server. In our example, we create a CSV named research_data_factors.csv.

  3. Create a file named giftless.yaml in your project root directory with the following content in order to have a local server:

TRANSFER_ADAPTERS:
  basic:
    factory: giftless.transfer.basic_streaming:factory
    options:
      storage_class: giftless.storage.local_storage:LocalStorage
AUTH_PROVIDERS:
  - giftless.auth.allow_anon:read_write
  1. Export it:
$ export GIFTLESS_CONFIG_FILE=giftless.yaml
  1. Start the Giftless server (by docker or Python).

  2. Initialize your git repo and connect it with the remote project:

git init
git remote add origin YOUR_REMOTE_REPO
  1. Track files with git-lfs:
git lfs track 'research_data_factors.csv'
git lfs track
git add .gitattributes #you should have a .gitattributes file at this point
git add "research_data_factors.csv"
git commit -m "Tracking data files"
  • You can see a list of tracked files with git lfs ls-files
  1. Configure lfs.url to point to your local Giftless server instance:
git config -f .lfsconfig lfs.url http://127.0.0.1:5000/<user_or_org>/<repo>/
# in our case, we used http://127.0.0.1:5000/datopian/example-proj-datahub-io/;
# make sure to end your lfs.url with /
  1. The previous configuration will produce changes into .lfsconfig file. Add it to git:
git add .lfsconfig
git commit -m "New git-lfs server endpoint"
# if you don't see any changes, run git rm --cached *.csv and then re-add your files, then commit it
git lfs push origin master

Configuration

It is also possible to configure Giftless' YAML file to use an external storage.

Azure Support

Modify your giftless.yaml file according to the following config:

    $ cat giftless.yaml

    TRANSFER_ADAPTERS:
      basic:
        factory: giftless.transfer.basic_external:factory
        options:
          storage_class: ..storage.azure:AzureBlobsStorage
          storage_options:
            connection_string: GetYourAzureConnectionStringAndPutItHere==
            container_name: lfs-storage
            path_prefix: large-files

Google Cloud Platform Support

To use Google Cloud Storage as a backend, you'll first need:

  • A Google Cloud Storage bucket to store objects in
  • an account key JSON file (see here).

The key must be associated with either a user or a service account, and should have read / write permissions on objects in the bucket.

If you plan to access objects from a browser, your bucket needs to have CORS enabled.

You can deploy the account key JSON file and provide the path to it as the account_key_file storage option:

TRANSFER_ADAPTERS:
  basic:
    factory: giftless.transfer.basic_streaming:factory
    options:
      storage_class: giftless.storage.google_cloud:GoogleCloudStorage
      storage_options:
        project_name: my-gcp-project
        bucket_name: git-lfs
        account_key_file: /path/to/credentials.json

Alternatively, you can base64-encode the contents of the JSON file and provide it inline as account_key_base64:

TRANSFER_ADAPTERS:
  basic:
    factory: giftless.transfer.basic_streaming:factory
    options:
      storage_class: giftless.storage.google_cloud:GoogleCloudStorage
      storage_options:
        project_name: my-gcp-project
        bucket_name: git-lfs
        account_key_base64: S0m3B4se64RandomStuff.....ThatI5Redac7edHeReF0rRead4b1lity==

After configuring your giftless.yaml file, export it:

$ export GIFTLESS_CONFIG_FILE=giftless.yaml

You will need uWSGI running. Install it with your preferred package manager. Here is an example of how to run it:

    # Run uWSGI in HTTP mode on port 8080
    $ uwsgi -M -T --threads 2 -p 2 --manage-script-name \
        --module giftless.wsgi_entrypoint --callable app --http 127.0.0.1:8080

See giftless/config.py for some default configuration options.

Configuration using .env files

[WIP] It is possible to use an .env file instead of a YAML file in case you need to deploy the project in a platform which does not support deploying configuration in files, such as Heroku.

At this time, we only support a raw format where we dump the content of giftless.yaml into an env var anmed YAML_CONTENT:

GIFTLESS_CONFIG_STR="TRANSFER_ADAPTERS:
  basic:
    factory: giftless.transfer.basic_streaming:factory
    options:
      storage_class: ..storage.google_cloud:GoogleCloudStorage
      storage_options:
        bucket_name: datahub-bbb
        api_key: API_KEY
AUTH_PROVIDERS:
  - giftless.auth.allow_anon:read_write
"

Note #1: As YAML is a superset of JSON, you can also provide a more compact JSON string instead.

Note #2:: If you provide both a YAML file (as GIFTLESS_CONFIG_FILE) and a literal YAML string (as GIFTLESS_CONFIG_STR), the two will be merged, with values from the YAML string taking precedence over values from the YAML file.

Transfer Adapters

Git LFS servers and clients can implement and negotiate different [transfer adapters] (https://github.com/git-lfs/git-lfs/blob/master/docs/api/basic-transfers.md). Typically, Git LFS will only define a basic transfer mode and support that. basic is simple and efficient for direct-to-storage uploads for backends that support uploading using a single PUT request.

To support more complex, and especially multi-part uploads (uploads done using more than one HTTP request, each with a different part of a large file) directly to backends that support that, Giftless adds support for a non-standard multipart-basic transfer mode. Note that this can only work with specific backends that support this type of functionality.

Enabling Multipart Transfer Mode

You can enable multipart transfers by adding the following lines to your Giftless config file:

TRANSFER_ADAPTERS:
  # Add the following lines:
  multipart-basic:
    factory: giftless.transfer.multipart:factory
    options:
      storage_class: giftless.storage.azure:AzureBlobsStorage
      storage_options:
        connection_string: "somesecretconnectionstringhere"
        container_name: my-multipart-storage

You must specify a storage_class that supports multipart transfers (implements the MultipartStorage interface). Currently, these are:

  • giftless.storage.azure:AzureBlobsStorage - Azure Blob Storage

The following additional options are available for multipart-basic transfer adapter:

  • action_lifetime - The maximal lifetime in seconds for signed multipart actions; Because multipart uploads tend to be of very large files and can easily take hours to complete, we recommend setting this to a few hours; The default is 6 hours.
  • max_part_size - Maximal length in bytes of a single part upload. The default is 10MB.

See the specific storage adapter for additional backend-specific configuration options to be added under storage_options.

Authenticators

TBD

Pre-Authorized Action Authenticators

TBD

Using Arbitrary WSGI Middleware

TBD

Fixing Generated URLs when Running Behind a Proxy

You can use the ProxyFix Werkzeug middleware to fix issues caused when Giftless runs behind a reverse proxy, causing generated URLs to not match the URLs expected by clients:

MIDDLEWARE:
  - class: werkzeug.middleware.proxy_fix:ProxyFix
    kwargs:
      x_host: 1
      x_port: 1
      x_prefix: 1

In order for this to work, you must ensure your reverse proxy (e.g. nginx) sets the right X-Forwarded-* headers when passing requests.

For example, if you have deployed giftless in an endpoint that is available to clients at https://example.com/lfs, the following nginx configuration is expected, in addition to the Giftless configuration set in the MIDDLEWARE section:

    location /lfs/ {
        proxy_pass http://giftless.internal.host:5000/;
        proxy_set_header X-Forwarded-Prefix /lfs;
    }

This example assumes Giftless is available to the reverse proxy at giftless.internal.host port 5000. In addition, X-Forwarded-Host, X-Forwarded-Port, X-Forwarded-Proto are automatically set by nginx by default.

Adding CORS Support

There are a number of CORS WSGI middleware implementations available on PyPI, and you can use any of them to add CORS headers control support to Giftless.

For example, you can enable CORS support using wsgi-cors-middleware:

(.venv) $ pip install wsgi_cors_middleware

And then add the following to your config file:

MIDDLEWARE:
  - class: wsgi_cors_middleware:CorsMiddleware
    kwargs:
      origin: https://www.example.com
      headers: ['Content-type', 'Accept', 'Authorization']
      methods: ['GET', 'POST', 'PUT']

Overview of the Git LFS workflow

mermaid-diagram-git-lfs-20200528

Development

giftless is based on Flask, with the following additional libraries:

  • Flask Classful for simplifying API endpoint implementation with Flask
  • Marshmallow for input / output serialization and validation
  • figcan for configuration handling

You must have Python 3.7 and newer set up to run or develop giftless.

Code Style

We use the following tools and standards to write giftless code:

  • flake8 to check your Python code for PEP8 compliance
  • import statements are checked by isort and should be organized accordingly
  • Type checking is done using mypy

Maximum line length is set to 120 characters.

Setting up a Virtual Environment

You should develop giftless in a virtual environment. We use pip-tools to manage both development and runtime dependencies.

The following snippet is an example of how to set up your virtual environment for development:

$ python3 -m venv .venv
$ . .venv/bin/activate

(.venv) $ pip install -r dev-requirements.txt
(.venv) $ pip-sync dev-requirements.txt requirements.txt

Running tests

Once in a virtual environment, you can simply run make test to run all tests and code style checks:

$ make test

We use pytest for Python unit testsing.

In addition, simple functions can specify some doctest style tests in the function docstring. These tests will be tested automatically when unit tests are executed.

Building a Docker image

Simply run make docker to build a uWSGI wrapped Docker image for giftless. The image will be named datopian/giftless:latest by default. You can change it, for example:

$ make docker DOCKER_REPO=mycompany DOCKER_IMAGE_TAG=1.2.3

Will build a Docekr image tagged mycompany/giftless:1.2.3.

License

Copyright (C) 2020, Viderum, Inc.

Giftless is free / open source software and is distributed under the terms of the MIT license. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

giftless-0.3.0.tar.gz (45.9 kB view details)

Uploaded Source

Built Distribution

giftless-0.3.0-py3-none-any.whl (55.6 kB view details)

Uploaded Python 3

File details

Details for the file giftless-0.3.0.tar.gz.

File metadata

  • Download URL: giftless-0.3.0.tar.gz
  • Upload date:
  • Size: 45.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for giftless-0.3.0.tar.gz
Algorithm Hash digest
SHA256 80bcdc9527a06846579cbded134f4eb11ae754c679846187bfd582ed4adee7dd
MD5 99dd6291ef5867cf72abe73d882ce161
BLAKE2b-256 b4c667e964dac94d2fc6fd9900c5952eb1d9be9dde652ea14a28d3e26a4621f2

See more details on using hashes here.

File details

Details for the file giftless-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: giftless-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 55.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for giftless-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e02705ddce4ca225f38709af1a0d2dcc618edb016d44d8461e19400291077f3f
MD5 e9cc2a0dbabb42e7b5e4b8f609ac23a5
BLAKE2b-256 7d44fda6222eff00d79d6742c988fc82cac6c074c864ebb9b829ffad0716fb83

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page