Skip to main content

pytorch-optimizer

Project description

torch-optimizer

https://travis-ci.com/jettify/pytorch-optimizer.svg?branch=master https://codecov.io/gh/jettify/pytorch-optimizer/branch/master/graph/badge.svg https://img.shields.io/pypi/pyversions/torch-optimizer.svg https://img.shields.io/pypi/v/torch-optimizer.svg

torch-optimizer – collection of optimizers for PyTorch.

Simple example

import torch_optimizer as optim

# model = ...
optimizer = optim.DiffGrad(model.parameters(), lr=0.001)
optimizer.step()

Installation

Installation process is simple, just:

$ pip install torch_optimizer

Supported Optimizers

AccSGD

https://arxiv.org/abs/1803.05591

AdaBound

https://arxiv.org/abs/1902.09843

AdaMod

https://arxiv.org/abs/1910.12249

DiffGrad

https://arxiv.org/abs/1909.11015

Lamb

https://arxiv.org/abs/1904.00962

NovoGrad

https://arxiv.org/abs/1905.11286

RAdam

https://arxiv.org/abs/1908.03265

SGDW

https://arxiv.org/abs/1608.03983

Yogi

https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization

AccSGD

docs/rastrigin_AccSGD.png docs/rosenbrock_AccSGD.png
import torch_optimizer as optim

# model = ...
optimizer = optim.AccSGD(
    model.parameters(),
    lr=1e-3,
    kappa=1000.0,
    xi=10.0,
    small_const=0.7,
    weight_decay=0
)
optimizer.step()

Paper: On the insufficiency of existing momentum schemes for Stochastic Optimization (2019) [https://arxiv.org/abs/1803.05591]

Reference Code: https://github.com/rahulkidambi/AccSGD

AdaBound

docs/rastrigin_AdaBound.png docs/rosenbrock_AdaBound.png
import torch_optimizer as optim

# model = ...
optimizer = optim.AdaBound(
    m.parameters(),
    lr= 1e-3,
    betas= (0.9, 0.999),
    final_lr = 0.1,
    gamma=1e-3,
    eps= 1e-8,
    weight_decay=0,
    amsbound=False,
)
optimizer.step()

Paper: Adaptive Gradient Methods with Dynamic Bound of Learning Rate (2019) [https://arxiv.org/abs/1902.09843]

Reference Code: https://github.com/Luolc/AdaBound

AdaMod

AdaMod method restricts the adaptive learning rates with adaptive and momental upper bounds. The dynamic learning rate bounds are based on the exponential moving averages of the adaptive learning rates themselves, which smooth out unexpected large learning rates and stabilize the training of deep neural networks.

docs/rastrigin_AdaMod.png docs/rosenbrock_AdaMod.png
import torch_optimizer as optim

# model = ...
optimizer = optim.AdaMod(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    beta3=0.999,
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: An Adaptive and Momental Bound Method for Stochastic Learning. (2019) [https://arxiv.org/abs/1910.12249]

Reference Code: https://github.com/lancopku/AdaMod

DiffGrad

Optimizer based on the difference between the present and the immediate past gradient, the step size is adjusted for each parameter in such a way that it should have a larger step size for faster gradient changing parameters and a lower step size for lower gradient changing parameters.

docs/rastrigin_DiffGrad.png docs/rosenbrock_DiffGrad.png
import torch_optimizer as optim

# model = ...
optimizer = optim.DiffGrad(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: diffGrad: An Optimization Method for Convolutional Neural Networks. (2019) [https://arxiv.org/abs/1909.11015]

Reference Code: https://github.com/shivram1987/diffGrad

Lamb

docs/rastrigin_Lamb.png docs/rosenbrock_Lamb.png
import torch_optimizer as optim

# model = ...
optimizer = optim.Lamb(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: Large Batch Optimization for Deep Learning: Training BERT in 76 minutes (2019) [https://arxiv.org/abs/1904.00962]

Reference Code: https://github.com/cybertronai/pytorch-lamb

NovoGrad

docs/rastrigin_NovoGrad.png docs/rosenbrock_NovoGrad.png
import torch_optimizer as optim

# model = ...
optimizer = optim.NovoGrad(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
    grad_averaging=False,
    amsgrad=False,
)
optimizer.step()

Paper: Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks (2019) [https://arxiv.org/abs/1905.11286]

Reference Code: https://github.com/NVIDIA/DeepLearningExamples/

RAdam

docs/rastrigin_RAdam.png docs/rosenbrock_RAdam.png
import torch_optimizer as optim

# model = ...
optimizer = optim.RAdam(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: On the Variance of the Adaptive Learning Rate and Beyond (2019) [https://arxiv.org/abs/1908.03265]

Reference Code: https://github.com/LiyuanLucasLiu/RAdam

SGDW

docs/rastrigin_SGDW.png docs/rosenbrock_SGDW.png
import torch_optimizer as optim

# model = ...
optimizer = optim.SGDW(
    m.parameters(),
    lr= 1e-3,
    momentum=0,
    dampening=0,
    weight_decay=1e-2,
    nesterov=False,
)
optimizer.step()

Paper: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) [https://arxiv.org/abs/1608.03983]

Reference Code: https://arxiv.org/abs/1608.03983

Yogi

Yogi is optimization algorithm based on ADAM with more fine grained effective learning rate control, and has similar theoretical guarantees on convergence as ADAM.

docs/rastrigin_Yogi.png docs/rosenbrock_Yogi.png
import torch_optimizer as optim

# model = ...
optimizer = optim.Yogi(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: Adaptive Methods for Nonconvex Optimization (2018) [https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization]

Reference Code: https://github.com/4rtemi5/Yogi-Optimizer_Keras

Changes

0.0.1 (YYYY-MM-DD)

  • Initial release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch-optimizer-0.0.1a6.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

torch_optimizer-0.0.1a6-py3-none-any.whl (24.0 kB view details)

Uploaded Python 3

File details

Details for the file torch-optimizer-0.0.1a6.tar.gz.

File metadata

  • Download URL: torch-optimizer-0.0.1a6.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.1

File hashes

Hashes for torch-optimizer-0.0.1a6.tar.gz
Algorithm Hash digest
SHA256 2b3a6bc1947983977c84001714291de43a9203648f54124275d72b78071c8671
MD5 c430e6ff0faa2f95b6edcbef888ec686
BLAKE2b-256 1a31d630f2c051324c5088021333541ac9095f42d13a81a0fbfb0eb07762a44e

See more details on using hashes here.

File details

Details for the file torch_optimizer-0.0.1a6-py3-none-any.whl.

File metadata

  • Download URL: torch_optimizer-0.0.1a6-py3-none-any.whl
  • Upload date:
  • Size: 24.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.1

File hashes

Hashes for torch_optimizer-0.0.1a6-py3-none-any.whl
Algorithm Hash digest
SHA256 9afdd966c7177699c2937013949fe5f08f3efb65bd08d59a3fdc20e7f041f891
MD5 669088b858e19a6e91e523848c05421b
BLAKE2b-256 5f82168a55e13ed8098c4dce4507a53bd20031a1ef8be261ac793758902e19bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page