Skip to main content

Backend.AI Manager

Project description

Package Structure

  • ai.backend

    • manager: Abstraction of agents and computation kernels

    • gateway: RESTful API gateway based on aiohttp

Installation

Backend.AI Agent requires Python 3.6 or higher. We highly recommend to use pyenv for an isolated setup of custom Python versions that might be different from default installations managed by your OS or Linux distros.

pip install backend.ai-manager

To use optional monitoring service (Datadog and Sentry) supports, add monitor extras tag to the pip command:

pip install 'backend.ai-manager[monitor]'

For development

We recommend to use virtual environments in Python. You may share a virtual environment with other Backend.AI projects.

git clone https://github.com/lablup/backend.ai-manager.git
python -m venv /home/user/venv
source /home/user/venv/bin/activate
pip install -U pip setuptools   # ensure latest versions
pip install -U -r requirements-dev.txt

The above example shows a standalone installation process for the manager, but normally you would want to install all other depedencies like agents and databases for integration tests.

Running and Deployment

Prepare databases

  • An RDBMS (PostgreSQL)

  • An etcd (v3) server

  • A Redis server

    • The manager uses the following database IDs

      • 0 (default): to track realtime performance metrics and statistics of computing sessions

      • 1: to track realtime request rate-limits of each API access key

Check out README on the meta-repo for the docker-compose example to run above databases with a single command.

Configuration

You need to specify configuration parameters using either CLI arguments or environment variables. The default values are for development settings so you should set most of them explicitly in production. For details about arguments and their equivalent environment variable names, run the server module with --help.

$ cp alembic.ini.sample alembic.ini
$ edit alembic.ini
$ python -m ai.backend.manager.cli schema oneshot head
Creating tables...
Stamping alembic version to ...

Optionally you can populate pre-defined fixtures. You may add your own ones in fixtures directory for deployment. example_keypair fixture is required to run the test suite.

$ python -m ai.backend.manager.cli fixture populate example_keypair
populating fixture 'example_keypair'

Running the API gateway server

$ python -m ai.backend.gateway.server \
         --etcd-addr localhost:2379 \
         --namespace my-cluster \
         --redis-addr localhost:6379 \
         --db-addr localhost:5432 \
         --db-name my-cluster \
         --db-user dbuser \
         --db-password dbpass \
         --docker-registry docker.example.com:5000 \
         --service-ip 127.0.0.1 \
         --service-port 8080 \
         --events-port 5002

The gateway server can directly serve the public traffic, either via plain HTTP or HTTPS (with --ssl-cert and --ssl-key options), but we recommend to use a dedicated reverse-proxy such as nginx for advanced HTTPS handling (e.g., SNI). Note that the gateway itself can fully utilize all the CPU cores in the system without limits from GIL (global interpreter lock).

Please check out --help to see more options and their defaults.

Example configs

/etc/supervisor/conf.d/manager.conf:

[program:backend.ai-manager]
user = user
stopsignal = TERM
stopasgroup = true
command = /home/user/run-manager.sh

/home/user/run-manager.sh:

#!/bin/sh
source /home/user/venv/bin/activate
# AWS API keypair for S3 file uploads (optional)
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
# Datadog monitoring (optional)
export DATADOG_API_KEY="..."
export DATADOG_APP_KEY="..."
# Sentry monitoring (optional)
export RAVEN_URI="..."
# the main command
exec python -m ai.backend.gateway.server \
     --etcd-addr localhost:2379 \
     --namespace my-cluster \
     --redis-addr localhost:6379 \
     # ... other options ...
     --service-ip 127.0.0.1 \
     --service-port 8080

/etc/nginx/sites-enabled/gateway:

ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_prefer_server_ciphers on;
ssl_ciphers EECDH+CHACHA20:EECDH+AES128:RSA+AES128:EECDH+AES256:RSA+AES256:EECDH+3DES:RSA+3DES:!MD5;

map $http_connection $connection_upgrade {
    default upgrade;
    ''      close;
}

server {
    listen 443 ssl;
    server_name my-cluster.example.com;
    charset utf-8;
    client_max_body_size 32M;

    ssl_certificate /path/to/ssl.crt
    ssl_certificate_key /path/to/ssl.key
    add_header Strict-Transport-Security "max-age=31536000; includeSubdomains";

    location / {
        proxy_pass http://127.0.0.1:8080;
        proxy_pass_request_headers on;
        proxy_set_header Host "my-cluster.example.com";
        proxy_redirect off;
        proxy_buffering off;
        proxy_read_timeout 600s;
    }

    location ~ ^/v\d+/stream/ {
        proxy_pass http://127.0.0.1:8080;
        proxy_pass_request_headers on;
        proxy_set_header Host "my-cluster.example.com";
        proxy_redirect off;
        proxy_buffering off;
        proxy_read_timeout 60s;

        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
    }
}

Networking

The manager and agent should run in the same local network or different networks reachable via VPNs.

You need to check the firewall settings to allow the following access patterns (all ports are TCP):

  • The manager’s service port: open to the reverse-proxy or the public Internet

  • The manager’s events port: open to the agents

  • The etcd’s service port: open to the manager and agents

  • The redis’ service port: open to the manager and agents

  • The (optional) private docker registry’s service port: open to the manager and agents

  • The database’s service port: open to the manager

  • The agents’ ALL ports: open to the manager

Note that etcd/redis server may run on different physical servers or cloud instances as long as the manager and agents can access them. The PostgreSQL database is only accessed by the manager.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

backend.ai-manager-1.0.2.tar.gz (41.2 kB view details)

Uploaded Source

Built Distribution

backend.ai_manager-1.0.2-py36-none-any.whl (53.6 kB view details)

Uploaded Python 3.6

File details

Details for the file backend.ai-manager-1.0.2.tar.gz.

File metadata

File hashes

Hashes for backend.ai-manager-1.0.2.tar.gz
Algorithm Hash digest
SHA256 7046def9464baa847c12f9a2f2a7db7642315e3c55c79a5489649a3d838ae1a6
MD5 8f83bba3a728460282e2a438fa6c6928
BLAKE2b-256 cfed679a67e660b5b39a20b12a733e8d02035af23ae16a25cebfdd37b70e8e9a

See more details on using hashes here.

Provenance

File details

Details for the file backend.ai_manager-1.0.2-py36-none-any.whl.

File metadata

File hashes

Hashes for backend.ai_manager-1.0.2-py36-none-any.whl
Algorithm Hash digest
SHA256 b4f14ed22fb14f436666adf57b2edc98b72f337d2b40cd2d5f4d9468fb7f7135
MD5 5914b2d5dd2228ee5f60679d7987c0e9
BLAKE2b-256 cafe435334e1577f99551a7a97f0de386bd8392e4733e86309d4e8bf5ff69886

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page