pytorch-optimizer
Project description
torch-optimizer
torch-optimizer – collection of optimizers for PyTorch.
Simple example
import torch_optimizer as optim
# model = ...
optimizer = optim.DiffGrad(model.parameters(), lr=0.001)
optimizer.step()
Installation
Installation process is simple, just:
$ pip install torch_optimizer
Supported Optimizers
https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization |
AccSGD
import torch_optimizer as optim
# model = ...
optimizer = optim.AccSGD(
model.parameters(),
lr=1e-3,
kappa=1000.0,
xi=10.0,
small_const=0.7,
weight_decay=0
)
optimizer.step()
Paper: On the insufficiency of existing momentum schemes for Stochastic Optimization (2019) [https://arxiv.org/abs/1803.05591]
Reference Code: https://github.com/rahulkidambi/AccSGD
AdaBound
import torch_optimizer as optim
# model = ...
optimizer = optim.AdaBound(
m.parameters(),
lr= 1e-3,
betas= (0.9, 0.999),
final_lr = 0.1,
gamma=1e-3,
eps= 1e-8,
weight_decay=0,
amsbound=False,
)
optimizer.step()
Paper: Adaptive Gradient Methods with Dynamic Bound of Learning Rate (2019) [https://arxiv.org/abs/1902.09843]
Reference Code: https://github.com/Luolc/AdaBound
AdaMod
AdaMod method restricts the adaptive learning rates with adaptive and momental upper bounds. The dynamic learning rate bounds are based on the exponential moving averages of the adaptive learning rates themselves, which smooth out unexpected large learning rates and stabilize the training of deep neural networks.
import torch_optimizer as optim
# model = ...
optimizer = optim.AdaMod(
m.parameters(),
lr= 1e-3,
betas=(0.9, 0.999),
beta3=0.999,
eps=1e-8,
weight_decay=0,
)
optimizer.step()
Paper: An Adaptive and Momental Bound Method for Stochastic Learning. (2019) [https://arxiv.org/abs/1910.12249]
Reference Code: https://github.com/lancopku/AdaMod
DiffGrad
Optimizer based on the difference between the present and the immediate past gradient, the step size is adjusted for each parameter in such a way that it should have a larger step size for faster gradient changing parameters and a lower step size for lower gradient changing parameters.
import torch_optimizer as optim
# model = ...
optimizer = optim.DiffGrad(
m.parameters(),
lr= 1e-3,
betas=(0.9, 0.999),
eps=1e-8,
weight_decay=0,
)
optimizer.step()
Paper: diffGrad: An Optimization Method for Convolutional Neural Networks. (2019) [https://arxiv.org/abs/1909.11015]
Reference Code: https://github.com/shivram1987/diffGrad
Lamb
import torch_optimizer as optim
# model = ...
optimizer = optim.Lamb(
m.parameters(),
lr= 1e-3,
betas=(0.9, 0.999),
eps=1e-8,
weight_decay=0,
)
optimizer.step()
Paper: Large Batch Optimization for Deep Learning: Training BERT in 76 minutes (2019) [https://arxiv.org/abs/1904.00962]
Reference Code: https://github.com/cybertronai/pytorch-lamb
NovoGrad
import torch_optimizer as optim
# model = ...
optimizer = optim.NovoGrad(
m.parameters(),
lr= 1e-3,
betas=(0.9, 0.999),
eps=1e-8,
weight_decay=0,
grad_averaging=False,
amsgrad=False,
)
optimizer.step()
Paper: Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks (2019) [https://arxiv.org/abs/1905.11286]
Reference Code: https://github.com/NVIDIA/DeepLearningExamples/
RAdam
import torch_optimizer as optim
# model = ...
optimizer = optim.RAdam(
m.parameters(),
lr= 1e-3,
betas=(0.9, 0.999),
eps=1e-8,
weight_decay=0,
)
optimizer.step()
Paper: On the Variance of the Adaptive Learning Rate and Beyond (2019) [https://arxiv.org/abs/1908.03265]
Reference Code: https://github.com/LiyuanLucasLiu/RAdam
SGDW
import torch_optimizer as optim
# model = ...
optimizer = optim.SGDW(
m.parameters(),
lr= 1e-3,
momentum=0,
dampening=0,
weight_decay=1e-2,
nesterov=False,
)
optimizer.step()
Paper: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) [https://arxiv.org/abs/1608.03983]
Reference Code: https://arxiv.org/abs/1608.03983
Yogi
Yogi is optimization algorithm based on ADAM with more fine grained effective learning rate control, and has similar theoretical guarantees on convergence as ADAM.
import torch_optimizer as optim
# model = ...
optimizer = optim.Yogi(
m.parameters(),
lr= 1e-3,
betas=(0.9, 0.999),
eps=1e-8,
weight_decay=0,
)
optimizer.step()
Paper: Adaptive Methods for Nonconvex Optimization (2018) [https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization]
Reference Code: https://github.com/4rtemi5/Yogi-Optimizer_Keras
Changes
0.0.1 (YYYY-MM-DD)
Initial release.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file torch-optimizer-0.0.1a6.tar.gz
.
File metadata
- Download URL: torch-optimizer-0.0.1a6.tar.gz
- Upload date:
- Size: 21.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b3a6bc1947983977c84001714291de43a9203648f54124275d72b78071c8671 |
|
MD5 | c430e6ff0faa2f95b6edcbef888ec686 |
|
BLAKE2b-256 | 1a31d630f2c051324c5088021333541ac9095f42d13a81a0fbfb0eb07762a44e |
File details
Details for the file torch_optimizer-0.0.1a6-py3-none-any.whl
.
File metadata
- Download URL: torch_optimizer-0.0.1a6-py3-none-any.whl
- Upload date:
- Size: 24.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9afdd966c7177699c2937013949fe5f08f3efb65bd08d59a3fdc20e7f041f891 |
|
MD5 | 669088b858e19a6e91e523848c05421b |
|
BLAKE2b-256 | 5f82168a55e13ed8098c4dce4507a53bd20031a1ef8be261ac793758902e19bd |