mlgym

MLgym, a python framework for distributeda and reproducible machine learning model training in research.

Project description

a feature-rich deep learning framework providing full reproducibility of experiments.

Reproducibility is a recurring issue in deep learning (research) with models often being implemented in Jupyter notebooks or entire training and evaluation pipelines implemented from scratch with every new project. The lack of standardization and repetitive boilerplate code of experimental setups impede reproducibility.

MLgym aims to increase reproducibility by separating the experimental setup from the code and providing the entire infrastructure for e.g., model training, model evaluation, experiment logging, checkpointing and experiment analysis.

Specifically, MLgym provides an extensible set of machine learning components (e.g., trainer, evaluator, loss functions, etc.). The framework instantiates these components dynamically as specified and parameterized within a configuration file (see here, for an exemplary configuration) describing the entire experiment setup (i.e., training and evaluation pipeline). The separation of experimental setup and code maximizes the replicability and interpretability of ML experiments. The machine learning components cut down the implementational efforts significantly and lets your focus solely on your ideas.

Additionally, MLgym provides the following key features:

Component registry to register custom components and their dependencies.
Warm starts allowing to resume training after crash
Customizable checkpointing strategies
MLboard webservice for experiment tracking / analysis (live and offline) by subscribing to the websocket logging environment
Large scale, multi GPU training supporting grid search, nested cross validation and cross validation
Distributed logging via websockets and event sourcing, allowing location-independent logging and full replicability
Definition of training and evaluation pipeline in a configuration file, achieving separation of experiment setup and code.

Please note, that at the moment this code should be treated as experimental and is not production ready.

Install

there are two options to install MLgym, the easiest way is to install the framework from the pip repository:

pip install mlgym

For the latest version, one can directly install it from source by cd into the root folder and then running

pip install src/

Usage

We provide an easy-to-use example that lets you run a MLgym experiment setup.

Before running the experiments we need to setup the MLboard logging environment, i.e., the websocket service and the RESTful webservice. MLgym logs the training/evaluation progress and evaluation results via the websocket API, allowing the MLboard frontend to receive live updates. The RESTful webservice provides endpoints to receive checkpoints and experiment setups. For a full specification of both APIs see here.

We start the websocket service and the RESTful webservice on ports 5001 and 5002, respectively. Feel free to choose different ports if desired. Similarly, we specify the folder event_storage as the local event storage folder. Note, to access the websocket service from a different port, we need to specify the CORS allowed origins. In thise example, we only use the websocket service locally from 127.0.0.1:8080 via the MLboard frontend.

ml_board_ws_endpoint --host 127.0.0.1 --port 5002 --event_storage_path event_storage --cors_allowed_origins http://127.0.0.1:8080

ml_board_rest_endpoint --port 5001 --event_storage_path event_storage

Next, we run the experiment setup. We cd into the example folder and run run.py with the respective config whose path is passed via the parameter gs_config_path. The parameter process_count specifies the number of experiments that we run in parallel. num_epochs limits the maximum number of epochs to train a model. If the model performance does not improve substantially over time, the checkpointing strategy defined in gs_config.yml will stop training prematurely.

cd mlgym/example/grid_search_example

python run.py --process_count 3 \
              --text_logging_path general_logging/ \
              --num_epochs 10 \
              --websocket_logging_servers http://127.0.0.1:5002 \
              --gs_rest_api_endpoint http://127.0.0.1:5001 \
              train \
              --gs_config_path gs_config.yml

To visualize the live updates, we run the MLboard frontend. We specify the server host and port that delivers the frontend and the endpoints of the REST webservice and the websocket service. The parameter run_id refers to the experiment run that we want to analyze and differs in your case. Each experiment runs is stored in separate folders within the event_storage path. The folder names refer to the respective experiment run ids.

ml_board --ml_board_host 127.0.0.1 --ml_board_port 8080 --rest_endpoint http://127.0.0.1:5001 --ws_endpoint http://127.0.0.1:5002 --run_id 2022-11-06--17-59-10

The script returns the parameterized URL pointing to the respective experiment run:

====> ACCESS MLBOARD VIA http://127.0.0.1:8080?rest_endpoint=http%3A//127.0.0.1%3A5001&ws_endpoint=http%3A//127.0.0.1%3A5002&run_id=2022-11-06--17-59-10

Note, that the Flask webservice delivers the compiled react files statically, which is why any changes to the frontend code will not be automatically reflected. As a solution, you can start the MLboard react app directly via yarn and call the URL with the respective URL search params in the browser

cd mlgym/src/ml_board/frontend/dashboard

yarn start

To this day, the MLboard frontend is still under development and not all features have been implemented, yet. Therefore, it is possible analyze the log files directly in the event storage. All messages are logged as specified within the websocket API

To see the messages live cd into the event storage directory and tail the event_storage.log file.

cd event_storage/2022-11-06--17-59-10/
tail -f event_storage.log

MLboard

Since MLboard is still under heavy development, we would like to give you a sneak peek about what is going to come in the foreseeable future.

Copyright

For license see: https://github.com/mlgym/mlgym/blob/master/LICENSE

Project details

Release history Release notifications | RSS feed

0.0.75

Nov 7, 2022

0.0.74

Nov 7, 2022

0.0.73

Nov 7, 2022

0.0.72

Nov 7, 2022

0.0.71

Nov 6, 2022

0.0.70

Nov 6, 2022

0.0.69

Nov 6, 2022

0.0.68

Nov 6, 2022

0.0.67

Nov 6, 2022

0.0.66

Nov 6, 2022

This version

0.0.65

Nov 6, 2022

0.0.63

Nov 6, 2022

0.0.62

May 27, 2022

0.0.61

May 24, 2022

0.0.60

May 17, 2022

0.0.58

May 9, 2022

0.0.57

May 3, 2022

0.0.56

Apr 10, 2022

0.0.55

Apr 5, 2022

0.0.54

Mar 15, 2022

0.0.53

Mar 15, 2022

0.0.52

Mar 4, 2022

0.0.51

Dec 20, 2021

0.0.50

Dec 1, 2021

0.0.49

Oct 25, 2021

0.0.48

Oct 22, 2021

0.0.47

Oct 5, 2021

0.0.46

Sep 25, 2021

0.0.45

Sep 25, 2021

0.0.44

Sep 22, 2021

0.0.43

Aug 8, 2021

0.0.42

Aug 4, 2021

0.0.41

Aug 3, 2021

0.0.40

Jul 17, 2021

0.0.39

Jul 14, 2021

0.0.38

Jul 6, 2021

0.0.37

Jul 2, 2021

0.0.36

Jul 2, 2021

0.0.35

Jun 22, 2021

0.0.34

Apr 29, 2021

0.0.33

Apr 29, 2021

0.0.32

Apr 29, 2021

0.0.31

Apr 28, 2021

0.0.30

Apr 28, 2021

0.0.29

Apr 19, 2021

0.0.28

Apr 18, 2021

0.0.27

Apr 2, 2021

0.0.26

Apr 1, 2021

0.0.25

Apr 1, 2021

0.0.24

Apr 1, 2021

0.0.23

Mar 12, 2021

0.0.22

Mar 11, 2021

0.0.21

Mar 11, 2021

0.0.20

Mar 9, 2021

0.0.18

Mar 6, 2021

0.0.17

Mar 6, 2021

0.0.16

Mar 3, 2021

0.0.15

Mar 3, 2021

0.0.14

Feb 25, 2021

0.0.13

Feb 23, 2021

0.0.12

Feb 23, 2021

0.0.11

Feb 9, 2021

0.0.10

Jan 23, 2021

0.0.9

Jan 5, 2021

0.0.8

Jan 5, 2021

0.0.7

Dec 21, 2020

0.0.6

Dec 21, 2020

0.0.5

Dec 4, 2020

0.0.4

Dec 4, 2020

0.0.3

Nov 28, 2020

0.0.2

Nov 28, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlgym-0.0.65.tar.gz (61.7 kB view details)

Uploaded Nov 6, 2022 Source

Built Distribution

mlgym-0.0.65-py3-none-any.whl (87.7 kB view details)

Uploaded Nov 6, 2022 Python 3

File details

Details for the file mlgym-0.0.65.tar.gz.

File metadata

Download URL: mlgym-0.0.65.tar.gz
Upload date: Nov 6, 2022
Size: 61.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for mlgym-0.0.65.tar.gz
Algorithm	Hash digest
SHA256	`cc9a3b14f519783e9a8a341a659120d0edc030b49f1ea7a50fc2b56fe313ddde`
MD5	`26581ef78aa72e054c966f65918c5f5d`
BLAKE2b-256	`5ac49e560238a5196ec9a5f035769e334f96f68b331b37ee1ba4c0c958f5c51a`

See more details on using hashes here.

File details

Details for the file mlgym-0.0.65-py3-none-any.whl.

File metadata

Download URL: mlgym-0.0.65-py3-none-any.whl
Upload date: Nov 6, 2022
Size: 87.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for mlgym-0.0.65-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1fdde0927ff80c039c657d4fc886d9d73d605a3fcbc62b130771e4b3f54e8d5f`
MD5	`c625c6ad7ec72fbd8a444f33c097b3a0`
BLAKE2b-256	`2b81b4aab0330e60e1e52e51e2cbdc66dad73db13e4d93dd3f0381c8e865917e`