torch-optimizer

pytorch-optimizer

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language

Project description

torch-optimizer

https://img.shields.io/pypi/pyversions/torch-optimizer.svg

https://img.shields.io/pypi/v/torch-optimizer.svg

torch-optimizer – collection of optimizers for PyTorch.

Simple example

import torch_optimizer as optim

# model = ...
optimizer = optim.DiffGrad(model.parameters(), lr=0.001)
optimizer.step()

Installation

Installation process is simple, just:

$ pip install torch_optimizer

Supported Optimizers

AccSGD	https://arxiv.org/abs/1803.05591
AdaBound	https://arxiv.org/abs/1902.09843
AdaMod	https://arxiv.org/abs/1910.12249
DiffGrad	https://arxiv.org/abs/1909.11015
Lamb	https://arxiv.org/abs/1904.00962
NovoGrad	https://arxiv.org/abs/1905.11286
RAdam	https://arxiv.org/abs/1908.03265
SGDW	https://arxiv.org/abs/1608.03983
Yogi	https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization

Visualisations

Visualisations help us to see how different algorithms deals with simple situations like: saddle points, local minima, valleys etc, and may provide interesting insights into inner workings of algorithm. Rosenbrock and Rastrigin benchmark functions was selected, because:

Rosenbrock (also known as banana function), is non-convex function that has one global minima (1.0. 1.0). The global minimum is inside a long, narrow, parabolic shaped flat valley. To find the valley is trivial. To converge to the global minima, however, is difficult. Optimization algorithms might pay a lot of attention to one coordinate, and have problems to follow valley which is relatively flat.

Rastrigin function is a non-convex and has one global minima in (0.0, 0.0). Finding the minimum of this function is a fairly difficult problem due to its large search space and its large number of local minima.

Each optimizer performs 501 optimization steps. Learning rate is best one found by hyper parameter search algorithm, rest of tuning parameters are default. It is very easy to extend script and tune other optimizer parameters.

python examples/viz_optimizers.py

AccSGD

import torch_optimizer as optim

# model = ...
optimizer = optim.AccSGD(
    model.parameters(),
    lr=1e-3,
    kappa=1000.0,
    xi=10.0,
    small_const=0.7,
    weight_decay=0
)
optimizer.step()

Paper: On the insufficiency of existing momentum schemes for Stochastic Optimization (2019) [https://arxiv.org/abs/1803.05591]

Reference Code: https://github.com/rahulkidambi/AccSGD

AdaBound

import torch_optimizer as optim

# model = ...
optimizer = optim.AdaBound(
    m.parameters(),
    lr= 1e-3,
    betas= (0.9, 0.999),
    final_lr = 0.1,
    gamma=1e-3,
    eps= 1e-8,
    weight_decay=0,
    amsbound=False,
)
optimizer.step()

Paper: Adaptive Gradient Methods with Dynamic Bound of Learning Rate (2019) [https://arxiv.org/abs/1902.09843]

Reference Code: https://github.com/Luolc/AdaBound

AdaMod

AdaMod method restricts the adaptive learning rates with adaptive and momental upper bounds. The dynamic learning rate bounds are based on the exponential moving averages of the adaptive learning rates themselves, which smooth out unexpected large learning rates and stabilize the training of deep neural networks.

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_AdaMod.png

import torch_optimizer as optim

# model = ...
optimizer = optim.AdaMod(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    beta3=0.999,
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: An Adaptive and Momental Bound Method for Stochastic Learning. (2019) [https://arxiv.org/abs/1910.12249]

Reference Code: https://github.com/lancopku/AdaMod

DiffGrad

Optimizer based on the difference between the present and the immediate past gradient, the step size is adjusted for each parameter in such a way that it should have a larger step size for faster gradient changing parameters and a lower step size for lower gradient changing parameters.

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_DiffGrad.png

import torch_optimizer as optim

# model = ...
optimizer = optim.DiffGrad(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: diffGrad: An Optimization Method for Convolutional Neural Networks. (2019) [https://arxiv.org/abs/1909.11015]

Reference Code: https://github.com/shivram1987/diffGrad

Lamb

import torch_optimizer as optim

# model = ...
optimizer = optim.Lamb(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: Large Batch Optimization for Deep Learning: Training BERT in 76 minutes (2019) [https://arxiv.org/abs/1904.00962]

Reference Code: https://github.com/cybertronai/pytorch-lamb

NovoGrad

import torch_optimizer as optim

# model = ...
optimizer = optim.NovoGrad(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
    grad_averaging=False,
    amsgrad=False,
)
optimizer.step()

Paper: Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks (2019) [https://arxiv.org/abs/1905.11286]

Reference Code: https://github.com/NVIDIA/DeepLearningExamples/

RAdam

import torch_optimizer as optim

# model = ...
optimizer = optim.RAdam(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: On the Variance of the Adaptive Learning Rate and Beyond (2019) [https://arxiv.org/abs/1908.03265]

Reference Code: https://github.com/LiyuanLucasLiu/RAdam

SGDW

import torch_optimizer as optim

# model = ...
optimizer = optim.SGDW(
    m.parameters(),
    lr= 1e-3,
    momentum=0,
    dampening=0,
    weight_decay=1e-2,
    nesterov=False,
)
optimizer.step()

Paper: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) [https://arxiv.org/abs/1608.03983]

Reference Code: https://arxiv.org/abs/1608.03983

Yogi

Yogi is optimization algorithm based on ADAM with more fine grained effective learning rate control, and has similar theoretical guarantees on convergence as ADAM.

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_Yogi.png

import torch_optimizer as optim

# model = ...
optimizer = optim.Yogi(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: Adaptive Methods for Nonconvex Optimization (2018) [https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization]

Reference Code: https://github.com/4rtemi5/Yogi-Optimizer_Keras

Changes

0.0.1 (YYYY-MM-DD)

Initial release.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

0.3.0

Oct 31, 2021

0.2.0

Oct 26, 2021

0.1.0

Jan 1, 2021

0.0.1a17 pre-release

Nov 27, 2020

0.0.1a16 pre-release

Oct 20, 2020

0.0.1a15 pre-release

Aug 11, 2020

0.0.1a14 pre-release

Jul 13, 2020

0.0.1a13 pre-release

Jun 17, 2020

0.0.1a12 pre-release

Apr 26, 2020

0.0.1a11 pre-release

Apr 5, 2020

0.0.1a10 pre-release

Mar 15, 2020

0.0.1a9 pre-release

Mar 4, 2020

This version

0.0.1a8 pre-release

Mar 2, 2020

0.0.1a7 pre-release

Feb 27, 2020

0.0.1a6 pre-release

Feb 22, 2020

0.0.1a5 pre-release

Feb 15, 2020

0.0.1a4 pre-release

Feb 11, 2020

0.0.1a3 pre-release

Feb 9, 2020

0.0.1a2 pre-release

Feb 3, 2020

0.0.1a1 pre-release

Jan 22, 2020

0.0.1a0 pre-release

Jan 5, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch-optimizer-0.0.1a8.tar.gz (24.3 kB view details)

Uploaded Mar 2, 2020 Source

Built Distribution

torch_optimizer-0.0.1a8-py3-none-any.whl (26.2 kB view details)

Uploaded Mar 2, 2020 Python 3

File details

Details for the file torch-optimizer-0.0.1a8.tar.gz.

File metadata

Download URL: torch-optimizer-0.0.1a8.tar.gz
Upload date: Mar 2, 2020
Size: 24.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.1

File hashes

Hashes for torch-optimizer-0.0.1a8.tar.gz
Algorithm	Hash digest
SHA256	`429e006a90a97627f57eb4625dd39ab137b737e16297ffde7adef29bfb4750d8`
MD5	`788cf4d715b49b212f4153a1f821ca02`
BLAKE2b-256	`1dee0cec414f5a4bba1a8a2deb1acc14c89dd47d19a3b0ca4a80bdc16ca2e404`

See more details on using hashes here.

File details

Details for the file torch_optimizer-0.0.1a8-py3-none-any.whl.

File metadata

Download URL: torch_optimizer-0.0.1a8-py3-none-any.whl
Upload date: Mar 2, 2020
Size: 26.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.1

File hashes

Hashes for torch_optimizer-0.0.1a8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6fa5bf2c08b16c1e55391417f29bf3d620d8d93a285d1ca019ac821168099096`
MD5	`49d52957238733912b072227b69e2900`
BLAKE2b-256	`d784822966979c1fcc95272f46f62b20d09bc81295c17ef6d456ded5b8baeccd`