Skip to main content

Backend.AI Manager

Project description

Package Structure

  • ai.backend

    • manager: Abstraction of agents and computation kernels

    • gateway: RESTful API gateway based on aiohttp

Installation

Backend.AI Agent requires Python 3.6 or higher. We highly recommend to use pyenv for an isolated setup of custom Python versions that might be different from default installations managed by your OS or Linux distros.

pip install backend.ai-manager

To use optional monitoring service (Datadog and Sentry) supports, add monitor extras tag to the pip command:

pip install 'backend.ai-manager[monitor]'

For development

We recommend to use virtual environments in Python. You may share a virtual environment with other Backend.AI projects.

git clone https://github.com/lablup/backend.ai-manager.git
python -m venv /home/user/venv
source /home/user/venv/bin/activate
pip install -U pip setuptools   # ensure latest versions
pip install -U -r requirements-dev.txt

The above example shows a standalone installation process for the manager, but normally you would want to install all other depedencies like agents and databases for integration tests.

Running and Deployment

Prepare databases

  • An RDBMS (PostgreSQL)

  • An etcd (v3) server

  • A Redis server

    • The manager uses the following database IDs

      • 0 (default): to track realtime performance metrics and statistics of computing sessions

      • 1: to track realtime request rate-limits of each API access key

Check out README on the meta-repo for the docker-compose example to run above databases with a single command.

Configuration

You need to specify configuration parameters using either CLI arguments or environment variables. The default values are for development settings so you should set most of them explicitly in production. For details about arguments and their equivalent environment variable names, run the server module with --help.

$ cp alembic.ini.sample alembic.ini
$ edit alembic.ini
$ python -m ai.backend.manager.cli schema oneshot head
Creating tables...
Stamping alembic version to ...

Optionally you can populate pre-defined fixtures. You may add your own ones in fixtures directory for deployment. example_keypair fixture is required to run the test suite.

$ python -m ai.backend.manager.cli fixture populate example_keypair
populating fixture 'example_keypair'

Running the API gateway server

$ python -m ai.backend.gateway.server \
         --etcd-addr localhost:2379 \
         --namespace my-cluster \
         --redis-addr localhost:6379 \
         --db-addr localhost:5432 \
         --db-name my-cluster \
         --db-user dbuser \
         --db-password dbpass \
         --docker-registry docker.example.com:5000 \
         --service-ip 127.0.0.1 \
         --service-port 8080 \
         --events-port 5002

The gateway server can directly serve the public traffic, either via plain HTTP or HTTPS (with --ssl-cert and --ssl-key options), but we recommend to use a dedicated reverse-proxy such as nginx for advanced HTTPS handling (e.g., SNI). Note that the gateway itself can fully utilize all the CPU cores in the system without limits from GIL (global interpreter lock).

Please check out --help to see more options and their defaults.

Example configs

/etc/supervisor/conf.d/manager.conf:

[program:backend.ai-manager]
user = user
stopsignal = TERM
stopasgroup = true
command = /home/user/run-manager.sh

/home/user/run-manager.sh:

#!/bin/sh
source /home/user/venv/bin/activate
# AWS API keypair for S3 file uploads (optional)
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
# Datadog monitoring (optional)
export DATADOG_API_KEY="..."
export DATADOG_APP_KEY="..."
# Sentry monitoring (optional)
export RAVEN_URI="..."
# the main command
exec python -m ai.backend.gateway.server \
     --etcd-addr localhost:2379 \
     --namespace my-cluster \
     --redis-addr localhost:6379 \
     # ... other options ...
     --service-ip 127.0.0.1 \
     --service-port 8080

/etc/nginx/sites-enabled/gateway:

ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_prefer_server_ciphers on;
ssl_ciphers EECDH+CHACHA20:EECDH+AES128:RSA+AES128:EECDH+AES256:RSA+AES256:EECDH+3DES:RSA+3DES:!MD5;

map $http_connection $connection_upgrade {
    default upgrade;
    ''      close;
}

server {
    listen 443 ssl;
    server_name my-cluster.example.com;
    charset utf-8;
    client_max_body_size 32M;

    ssl_certificate /path/to/ssl.crt
    ssl_certificate_key /path/to/ssl.key
    add_header Strict-Transport-Security "max-age=31536000; includeSubdomains";

    location / {
        proxy_pass http://127.0.0.1:8080;
        proxy_pass_request_headers on;
        proxy_set_header Host "my-cluster.example.com";
        proxy_redirect off;
        proxy_buffering off;
        proxy_read_timeout 600s;
    }

    location ~ ^/v\d+/stream/ {
        proxy_pass http://127.0.0.1:8080;
        proxy_pass_request_headers on;
        proxy_set_header Host "my-cluster.example.com";
        proxy_redirect off;
        proxy_buffering off;
        proxy_read_timeout 60s;

        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
    }
}

Networking

The manager and agent should run in the same local network or different networks reachable via VPNs.

You need to check the firewall settings to allow the following access patterns (all ports are TCP):

  • The manager’s service port: open to the reverse-proxy or the public Internet

  • The manager’s events port: open to the agents

  • The etcd’s service port: open to the manager and agents

  • The redis’ service port: open to the manager and agents

  • The (optional) private docker registry’s service port: open to the manager and agents

  • The database’s service port: open to the manager

  • The agents’ ALL ports: open to the manager

Note that etcd/redis server may run on different physical servers or cloud instances as long as the manager and agents can access them. The PostgreSQL database is only accessed by the manager.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

backend.ai-manager-1.0.4.tar.gz (43.1 kB view details)

Uploaded Source

Built Distribution

backend.ai_manager-1.0.4-py36-none-any.whl (56.1 kB view details)

Uploaded Python 3.6

File details

Details for the file backend.ai-manager-1.0.4.tar.gz.

File metadata

File hashes

Hashes for backend.ai-manager-1.0.4.tar.gz
Algorithm Hash digest
SHA256 e3b1cb991a4e98273281e9c5cc3df9695514b7323b2dc49cb1d239890610b399
MD5 7502da9da9828d64d2986822009b01cf
BLAKE2b-256 34fa53dce5eee285c935af829e111ae5bae37f08ba34abccabaf4d762e1978d3

See more details on using hashes here.

Provenance

File details

Details for the file backend.ai_manager-1.0.4-py36-none-any.whl.

File metadata

File hashes

Hashes for backend.ai_manager-1.0.4-py36-none-any.whl
Algorithm Hash digest
SHA256 5a2463352e7be7222645e52d7e3d7270fb9bb9897680b1bf979f6a48dacc35dd
MD5 7e867b692c9fd5a1a89d1eea28b7bd80
BLAKE2b-256 30898a642000f8d78f27c3c0efecdfb678338f59125c737fba0ec919e815aa56

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page