Backend.AI Agent

These details have not been verified by PyPI

Project links

Project description

Backend.AI Agent

The Backend.AI Agent is a small daemon that does:

Reports the status and available resource slots of a worker to the manager
Routes code execution requests to the designated kernel container
Manages the lifecycle of kernel containers (create/monitor/destroy them)

Package Structure

ai.backend
- agent: The agent package
  - docker: A docker-based backend implementation for the kernel lifecycle interface.
  - server: The agent daemon which communicates with the manager and the Docker daemon
  - watcher: A side-by-side daemon which provides a separate HTTP endpoint for accessing the status information of the agent daemon and manipulation of the agent's systemd service
- helpers: A utility package that is available as ai.backend.helpers inside Python-based containers
- kernel: Language-specific runtimes (mostly ipykernel client adaptor) which run inside containers
- runner: Auxiliary components (usually self-contained binaries) mounted inside containers

Installation

Please visit the installation guides.

Kernel/system configuration

Recommended kernel parameters in the bootloader (e.g., Grub):

cgroup_enable=memory swapaccount=1

Recommended resource limits:

/etc/security/limits.conf

root hard nofile 512000
root soft nofile 512000
root hard nproc 65536
root soft nproc 65536
user hard nofile 512000
user soft nofile 512000
user hard nproc 65536
user soft nproc 65536

sysctl

fs.file-max=2048000
fs.inotify.max_user_watches=524288
net.core.somaxconn=1024
net.ipv4.tcp_max_syn_backlog=1024
net.ipv4.tcp_slow_start_after_idle=0
net.ipv4.tcp_fin_timeout=10
net.ipv4.tcp_window_scaling=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_early_retrans=1
net.ipv4.ip_local_port_range=40000 65000
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 12582912 16777216
net.ipv4.tcp_wmem=4096 12582912 16777216
net.netfilter.nf_conntrack_max=10485760
net.netfilter.nf_conntrack_tcp_timeout_established=432000
net.netfilter.nf_conntrack_tcp_timeout_close_wait=10
net.netfilter.nf_conntrack_tcp_timeout_fin_wait=10
net.netfilter.nf_conntrack_tcp_timeout_time_wait=10

The ip_local_port_range should not overlap with the container port range pool (default: 30000 to 31000).

To apply netfilter settings during the boot time, you may need to add nf_conntrack to /etc/modules so that sysctl could set the net.netfilter.nf_conntrack_* values.

For development

Prerequisites

Python 3.6 or higher with pyenv and pyenv-virtualenv (optional but recommneded)
Docker 18.03 or later with docker-compose (18.09 or later is recommended)

First, you need a working manager installation. For the detailed instructions on installing the manager, please refer the manager's README and come back here again.

Preparing working copy

Install and activate git-lfs to work with pre-built binaries in src/ai/backend/runner.

$ git lfs install

Next, prepare the source clone of the agent and install from it as follows. pyenv is just a recommendation; you may use other virtualenv management tools.

$ git clone https://github.com/lablup/backend.ai-agent agent
$ cd agent
$ pyenv virtualenv venv-agent
$ pyenv local venv-agent
$ pip install -U pip setuptools
$ pip install -U -r requirements/dev.txt

Linting

We use flake8 and mypy to statically check our code styles and type consistency. Enable those linters in your favorite IDE or editor.

Halfstack (single-node development & testing)

With the halfstack, you can run the agent simply. Note that you need a working manager running with the halfstack already!

Recommended directory structure

backend.ai-dev
- manager (git clone from the manager repo)
- agent (git clone from here)
- common (git clone from the common repo)

Install backend.ai-common as an editable package in the agent (and the manager) virtualenvs to keep the codebase up-to-date.

$ cd agent
$ pip install -U -e ../common

Steps

$ mkdir -p "./scratches"
$ cp config/halfstack.toml ./agent.toml

If you're running agent under linux, make sure you've set appropriate iptables rule before starting agent. This can be done by executing script scripts/update-metadata-iptables.sh before each agent start.

Then, run it (for debugging, append a --debug flag):

$ python -m ai.backend.agent.server

To run the agent-watcher:

$ python -m ai.backend.agent.watcher

The watcher shares the same configuration TOML file with the agent. Note that the watcher is only meaningful if the agent is installed as a systemd service named backendai-agent.service.

To run tests:

$ python -m flake8 src tests
$ python -m pytest -m 'not integration' tests

Deployment

Configuration

Put a TOML-formatted agent configuration (see the sample in config/sample.toml) in one of the following locations:

agent.toml (current working directory)
~/.config/backend.ai/agent.toml (user-config directory)
/etc/backend.ai/agent.toml (system-config directory)

Only the first found one is used by the daemon.

The agent reads most other configurations from the etcd v3 server where the cluster administrator or the Backend.AI manager stores all the necessary settings.

The etcd address and namespace must match with the manager to make the agent paired and activated. By specifying distinguished namespaces, you may share a single etcd cluster with multiple separate Backend.AI clusters.

By default the agent uses /var/cache/scratches directory for making temporary home directories used by kernel containers (the /home/work volume mounted in containers). Note that the directory must exist in prior and the agent-running user must have ownership of it. You can change the location by scratch-root option in agent.toml.

Running from a command line

The minimal command to execute:

python -m ai.backend.agent.server
python -m ai.backend.agent.watcher

For more arguments and options, run the command with --help option.

Example config for systemd

/etc/systemd/system/backendai-agent.service:

[Unit]
Description=Backend.AI Agent
Requires=docker.service
After=network.target remote-fs.target docker.service

[Service]
Type=simple
User=root
Group=root
Environment=HOME=/home/user
ExecStart=/home/user/backend.ai/agent/run-agent.sh
WorkingDirectory=/home/user/backend.ai/agent
KillMode=process
KillSignal=SIGTERM
PrivateTmp=false
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

/home/user/backend.ai/agent/run-agent.sh:

#! /bin/sh
if [ -z "$PYENV_ROOT" ]; then
  export PYENV_ROOT="$HOME/.pyenv"
  export PATH="$PYENV_ROOT/bin:$PATH"
fi
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"

cd /home/user/backend.ai/agent
if [ "$#" -eq 0 ]; then
  sh /home/user/backend.ai/agent/scripts/update-metadata-iptables.sh
  exec python -m ai.backend.agent.server
else
  exec "$@"
fi

Networking

The manager and agent should run in the same local network or different networks reachable via VPNs, whereas the manager's API service must be exposed to the public network or another private network that users have access to.

The manager must be able to access TCP ports 6001, 6009, and 30000 to 31000 of the agents in default configurations. You can of course change those port numbers and ranges in the configuration.

Manager-to-Agent TCP Ports	Usage
6001	ZeroMQ-based RPC calls from managers to agents
6009	HTTP watcher API
30000-31000	Port pool for in-container services

The operation of agent itself does not require both incoming/outgoing access to the public Internet, but if the user's computation programs need the Internet, the docker containers should be able to access the public Internet (maybe via some corporate firewalls).

Agent-to-X TCP Ports	Usage
manager:5002	ZeroMQ-based event push from agents to the manager
etcd:2379	etcd API access
redis:6379	Redis API access
docker-registry:{80,443}	HTTP watcher API
(Other hosts)	Depending on user program requirements

LICENSES

GNU Lesser General Public License Dependencies

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

24.9.1rc2 pre-release

Oct 28, 2024

24.9.1rc1 pre-release

Oct 25, 2024

This version

24.9.0

Oct 21, 2024

24.9.0rc1 pre-release

Oct 21, 2024

24.3.11

Oct 21, 2024

24.3.10

Sep 27, 2024

24.3.10rc1 pre-release

Sep 27, 2024

24.3.10b3 pre-release

Sep 5, 2024

24.3.10b2 pre-release

Sep 4, 2024

24.3.10b1 pre-release

Sep 4, 2024

24.3.9

Aug 23, 2024

24.3.9rc1 pre-release

Aug 23, 2024

24.3.9b1 pre-release

Aug 21, 2024

24.3.8

Aug 13, 2024

24.3.8rc2 pre-release

Aug 7, 2024

24.3.8rc1 pre-release

Aug 5, 2024

24.3.7

Aug 4, 2024

24.3.7rc2 pre-release

Aug 1, 2024

24.3.7rc1 pre-release

Jul 31, 2024

24.3.7b4 pre-release

Jul 31, 2024

24.3.7b3 pre-release

Jul 17, 2024

24.3.7b2 pre-release

Jul 16, 2024

24.3.7b1 pre-release

Jul 15, 2024

24.3.7a2 pre-release

Jul 8, 2024

24.3.7a1 pre-release

Jul 5, 2024

24.3.6

Jun 21, 2024

24.3.5

Jun 19, 2024

24.3.5rc1 pre-release

Jun 19, 2024

24.3.5b1 pre-release

Jun 16, 2024

24.3.4

Jun 4, 2024

24.3.4rc1 pre-release

Jun 3, 2024

24.3.4b2 pre-release

Jun 2, 2024

24.3.4b1 pre-release

May 31, 2024

24.3.3

Apr 30, 2024

24.3.3rc3 pre-release

Apr 30, 2024

24.3.3rc2 pre-release

Apr 30, 2024

24.3.3rc1 pre-release

Apr 29, 2024

24.3.2

Apr 17, 2024

24.3.2rc2 pre-release

Apr 17, 2024

24.3.2rc1 pre-release

Apr 17, 2024

24.3.1rc1 pre-release

Apr 16, 2024

24.3.0

Apr 5, 2024

24.3.0rc4 pre-release

Apr 5, 2024

24.3.0rc3 pre-release

Apr 5, 2024

24.3.0rc2 pre-release

Mar 31, 2024

24.3.0rc1 pre-release

Mar 31, 2024

24.3.0b1 pre-release

Mar 14, 2024

24.3.0a2 pre-release

Feb 14, 2024

23.9.10

Mar 27, 2024

23.9.10rc6 pre-release

Mar 14, 2024

23.9.10rc5 pre-release

Mar 13, 2024

23.9.10rc4 pre-release

Mar 6, 2024

23.9.10rc3 pre-release

Mar 5, 2024

23.9.10rc2 pre-release yanked

Mar 5, 2024

Reason this release was yanked:

Fatal error in Backend.AI Manager

23.9.10rc1 pre-release

Mar 4, 2024

23.9.9rc1 pre-release

Feb 4, 2024

23.9.8

Jan 20, 2024

23.9.8rc4 pre-release

Jan 18, 2024

23.9.8rc3 pre-release

Jan 12, 2024

23.9.8rc2 pre-release

Dec 15, 2023

23.9.8rc1 pre-release

Dec 14, 2023

23.9.6

Dec 20, 2023

23.9.5

Nov 2, 2023

23.9.4 yanked

Nov 1, 2023

Reason this release was yanked:

Shipped with invalid static webui bundle

23.9.3

Oct 26, 2023

23.9.2

Oct 24, 2023

23.9.1

Oct 10, 2023

23.9.0

Sep 28, 2023

23.9.0b3 pre-release

Sep 22, 2023

23.9.0b2 pre-release

Sep 20, 2023

23.9.0b1 pre-release yanked

Sep 19, 2023

23.9.0a4 pre-release

Sep 8, 2023

23.9.0a3 pre-release

Sep 6, 2023

23.9.0a2 pre-release

Sep 6, 2023

23.9.0a1 pre-release

Sep 5, 2023

23.3.12

Sep 5, 2023

23.3.11

Aug 19, 2023

23.3.10

Jul 31, 2023

23.3.9

Jul 17, 2023

23.3.8

Jul 6, 2023

23.3.7

Jul 3, 2023

23.3.6

Jun 14, 2023

23.3.5

Jun 13, 2023

23.3.4

May 29, 2023

23.3.3

May 25, 2023

23.3.2

May 5, 2023

23.3.0

Mar 29, 2023

23.3.0a4 pre-release

Mar 16, 2023

23.3.0a3 pre-release

Mar 15, 2023

23.3.0a2 pre-release

Mar 15, 2023

23.3.0a1 pre-release

Mar 2, 2023

23.3.0.dev0 pre-release

Feb 6, 2023

22.9.23

Aug 8, 2023

22.9.22

May 29, 2023

22.9.21

Apr 3, 2023

22.9.20

Mar 26, 2023

22.9.19

Mar 22, 2023

22.9.18

Mar 19, 2023

22.9.17

Mar 9, 2023

22.9.16

Mar 5, 2023

22.9.15

Mar 3, 2023

22.9.12

Feb 20, 2023

22.9.11

Feb 12, 2023

22.9.10

Feb 4, 2023

22.9.9

Jan 25, 2023

22.9.8

Jan 10, 2023

22.9.7

Jan 9, 2023

22.9.6

Dec 9, 2022

22.9.5

Nov 28, 2022

22.9.4

Oct 26, 2022

22.9.3

Oct 25, 2022

22.9.2

Oct 17, 2022

22.9.1

Oct 7, 2022

22.9.0

Sep 28, 2022

22.9.0b6 pre-release

Sep 2, 2022

22.9.0b5 pre-release

Aug 30, 2022

22.9.0b4 pre-release

Aug 22, 2022

22.9.0b3 pre-release

Aug 18, 2022

22.9.0b2 pre-release

Aug 18, 2022

22.6.0b4 pre-release

Jul 28, 2022

22.6.0b3 pre-release

Jul 27, 2022

22.6.0b2 pre-release

Jul 18, 2022

22.6.0b1 pre-release

Jun 26, 2022

22.6.0.dev4 pre-release

Jun 9, 2022

22.6.0.dev3 pre-release

Jun 9, 2022

22.6.0.dev2 pre-release

Jun 3, 2022

22.6.0.dev1 pre-release

Jun 3, 2022

22.6.0.dev0 pre-release

May 28, 2022

22.3.14

Aug 30, 2022

22.3.13

Aug 18, 2022

22.3.12

Aug 18, 2022

22.3.11

Aug 18, 2022

22.3.10

Jul 18, 2022

22.3.9

Jul 10, 2022

22.3.8

Jun 26, 2022

22.3.7

Jun 16, 2022

22.3.6

Jun 10, 2022

22.3.5

Jun 8, 2022

22.3.4

Jun 8, 2022

22.3.4rc1 pre-release

Jun 8, 2022

22.3.3

May 24, 2022

22.3.2

May 17, 2022

22.3.1

May 3, 2022

22.3.0

Apr 25, 2022

22.3.0b2 pre-release

Apr 18, 2022

22.3.0b1 pre-release

Apr 12, 2022

22.3.0a2 pre-release

Mar 29, 2022

22.3.0a1 pre-release

Mar 14, 2022

21.9.9

Mar 29, 2022

21.9.8

Mar 7, 2022

21.9.7

Jan 26, 2022

21.9.6

Jan 14, 2022

21.9.5

Jan 13, 2022

21.9.4

Jan 10, 2022

21.9.3

Jan 10, 2022

21.9.2

Dec 15, 2021

21.9.1

Nov 11, 2021

21.9.0

Nov 8, 2021

21.9.0a2 pre-release

Sep 28, 2021

21.9.0a1 pre-release

Aug 25, 2021

21.9.0.dev2 pre-release

Aug 25, 2021

21.9.0.dev1 pre-release

Aug 25, 2021

21.9.0.dev0 pre-release

Aug 25, 2021

21.3.22

Mar 29, 2022

21.3.21

Mar 7, 2022

21.3.20

Jan 26, 2022

21.3.19

Jan 14, 2022

21.3.18

Jan 13, 2022

21.3.17

Jan 10, 2022

21.3.16

Jan 10, 2022

21.3.15

Dec 15, 2021

21.3.14

Nov 8, 2021

21.3.13

Sep 28, 2021

21.3.12

Sep 15, 2021

21.3.11

Sep 3, 2021

21.3.10

Sep 2, 2021

21.3.9

Aug 23, 2021

21.3.8

Jul 13, 2021

21.3.7

Jun 28, 2021

21.3.6

Jun 18, 2021

21.3.5

Jun 13, 2021

21.3.4

Jun 6, 2021

21.3.3

Jun 6, 2021

21.3.2

May 17, 2021

21.3.1

May 13, 2021

21.3.0

Mar 29, 2021

20.9.13

Sep 2, 2021

20.9.12

Aug 23, 2021

20.9.11

Jul 13, 2021

20.9.10

Jun 18, 2021

20.9.9

May 17, 2021

20.9.8

May 13, 2021

20.9.7

Mar 29, 2021

20.9.6

Feb 22, 2021

20.9.5

Feb 16, 2021

20.9.4

Feb 1, 2021

20.9.3

Jan 19, 2021

20.9.2

Jan 4, 2021

20.9.1

Dec 28, 2020

20.9.0

Dec 26, 2020

20.9.0rc2 pre-release

Dec 24, 2020

20.9.0rc1 pre-release

Dec 23, 2020

20.9.0b3 pre-release

Dec 21, 2020

20.9.0b2 pre-release

Dec 20, 2020

20.9.0b1 pre-release

Dec 18, 2020

20.9.0a6 pre-release

Dec 17, 2020

20.9.0a5 pre-release

Dec 2, 2020

20.9.0a4 pre-release

Nov 16, 2020

20.9.0a3 pre-release

Nov 2, 2020

20.9.0a2 pre-release

Oct 30, 2020

20.9.0a1 pre-release

Oct 5, 2020

20.3.14

Mar 4, 2021

20.3.13

Feb 1, 2021

20.3.12

Jan 19, 2021

20.3.11

Jan 15, 2021

20.3.10

Jan 6, 2021

20.3.9

Dec 17, 2020

20.3.8

Dec 2, 2020

20.3.7

Nov 23, 2020

20.3.6

Nov 23, 2020

20.3.5

Oct 23, 2020

20.3.4

Oct 6, 2020

20.3.3

Sep 9, 2020

20.3.2

Aug 10, 2020

20.3.1

Jul 28, 2020

20.3.0

Jul 27, 2020

20.3.0rc1 pre-release

Jul 22, 2020

20.3.0b2 pre-release

Jul 2, 2020

20.3.0b1 pre-release

May 12, 2020

19.9.27

Aug 5, 2020

19.9.26

May 28, 2020

19.9.25

May 20, 2020

19.9.24

May 15, 2020

19.9.23

Apr 30, 2020

19.9.22

Apr 30, 2020

19.9.21

Apr 29, 2020

19.9.20

Apr 27, 2020

19.9.19

Mar 23, 2020

19.9.18

Mar 21, 2020

19.9.17

Mar 8, 2020

19.9.16

Feb 27, 2020

19.9.15

Feb 20, 2020

19.9.14

Feb 16, 2020

19.9.13

Feb 10, 2020

19.9.12

Feb 10, 2020

19.9.11

Jan 20, 2020

19.9.10

Jan 9, 2020

19.9.9

Dec 18, 2019

19.9.8

Dec 15, 2019

19.9.7

Nov 11, 2019

19.9.6

Nov 3, 2019

19.9.5

Oct 16, 2019

19.9.4

Oct 15, 2019

19.9.3

Oct 14, 2019

19.9.2

Oct 10, 2019

19.9.1

Oct 10, 2019

19.9.0

Oct 7, 2019

19.9.0rc3 pre-release

Oct 4, 2019

19.9.0rc2 pre-release

Sep 24, 2019

19.9.0rc1 pre-release

Sep 22, 2019

19.9.0b12 pre-release

Sep 9, 2019

19.9.0b11 pre-release

Sep 2, 2019

19.9.0b10 pre-release

Aug 31, 2019

19.9.0b9 pre-release

Aug 31, 2019

19.9.0b8 pre-release

Aug 29, 2019

19.9.0b7 pre-release

Aug 26, 2019

19.9.0b6 pre-release

Aug 21, 2019

19.9.0b5 pre-release

Aug 20, 2019

19.9.0b4 pre-release

Aug 14, 2019

19.9.0b3 pre-release

Aug 6, 2019

19.6.0b2 pre-release

Jul 24, 2019

19.6.0b1 pre-release

Jul 14, 2019

19.6.0a1 pre-release

Jun 2, 2019

19.3.6

Sep 25, 2019

19.3.5

Aug 19, 2019

19.3.4

Aug 14, 2019

19.3.3

Jul 12, 2019

19.3.2

Jul 12, 2019

19.3.1

Apr 21, 2019

19.3.0

Apr 9, 2019

19.3.0rc2 pre-release

Mar 25, 2019

19.3.0rc1 pre-release

Feb 24, 2019

19.3.0b7 pre-release

Feb 14, 2019

19.3.0b6 pre-release

Feb 7, 2019

19.3.0b5 pre-release

Feb 1, 2019

19.3.0b4 pre-release

Jan 31, 2019

19.3.0b3 pre-release

Jan 30, 2019

19.3.0b2 pre-release

Jan 30, 2019

19.3.0b1 pre-release

Jan 29, 2019

19.3.0a3 pre-release

Jan 21, 2019

19.3.0a2 pre-release

Jan 21, 2019

19.3.0a1 pre-release

Jan 18, 2019

18.12.1

Jan 5, 2019

18.12.0

Jan 5, 2019

18.12.0a4 pre-release

Dec 25, 2018

18.12.0a3 pre-release

Dec 21, 2018

18.12.0a2 pre-release

Dec 21, 2018

18.12.0a1 pre-release

Dec 14, 2018

1.4.0

Sep 30, 2018

1.3.7

Apr 4, 2018

1.3.6

Apr 4, 2018

1.3.5

Mar 19, 2018

1.3.4

Mar 19, 2018

1.3.3

Mar 15, 2018

1.3.2

Mar 15, 2018

1.3.1

Mar 14, 2018

1.3.0

Mar 14, 2018

1.2.1

Jan 30, 2018

1.2.0

Jan 29, 2018

1.1.0

Jan 6, 2018

1.0.6

Nov 29, 2017

1.0.5

Nov 17, 2017

1.0.4

Nov 13, 2017

1.0.3

Nov 10, 2017

1.0.2

Nov 10, 2017

1.0.1

Nov 9, 2017

1.0.0

Oct 16, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

backend_ai_agent-24.9.0.tar.gz (18.0 MB view details)

Uploaded Oct 21, 2024 Source

Built Distribution

backend.ai_agent-24.9.0-py3-none-any.whl (18.0 MB view details)

Uploaded Oct 21, 2024 Python 3

File details

Details for the file backend_ai_agent-24.9.0.tar.gz.

File metadata

Download URL: backend_ai_agent-24.9.0.tar.gz
Upload date: Oct 21, 2024
Size: 18.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for backend_ai_agent-24.9.0.tar.gz
Algorithm	Hash digest
SHA256	`b87f6b886d83273962adcaa1dbfb0a177ce5372b41b089cb19802def923db3c5`
MD5	`89600d21a8662f3798651bba01237394`
BLAKE2b-256	`0c4d4257a7b12ce4502c46226f6fa35b09b29c244991f0c6c6746c76f29876e8`

See more details on using hashes here.

File details

Details for the file backend.ai_agent-24.9.0-py3-none-any.whl.

File metadata

Download URL: backend.ai_agent-24.9.0-py3-none-any.whl
Upload date: Oct 21, 2024
Size: 18.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for backend.ai_agent-24.9.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`abe5423734b4b04ccaead950d18170b0fa792d4db85a1f756c5ee83ce03f5ed2`
MD5	`8b8ca23b59c1c8b4c2943258fe08b2d5`
BLAKE2b-256	`55fd7c9d7f34cbeff14e10b56890d6edd59effb01cb47712f0b6e68b022109e2`