class AdamW(Optimizer):
r"""Implements AdamW algorithm.
- The original Adam algorithm was proposed in `Adam: A Method for Stochastic Optimization`_.
- The AdamW variant was proposed in `Decoupled Weight Decay Regularization`_.
+ .. math::
+ \begin{aligned}
+ &\rule{110mm}{0.4pt} \\
+ &\textbf{input} : \gamma \text{(lr)}, \: \beta_1, \beta_2
+ \text{(betas)}, \: \theta_0 \text{(params)}, \: f(\theta) \text{(objective)},
+ \: \epsilon \text{ (epsilon)} \\
+ &\hspace{13mm} \lambda \text{(weight decay)}, \: amsgrad \\
+ &\textbf{initialize} : m_0 \leftarrow 0 \text{ (first moment)}, v_0 \leftarrow 0
+ \text{ ( second moment)}, \: \widehat{v_0}^{max}\leftarrow 0 \\[-1.ex]
+ &\rule{110mm}{0.4pt} \\
+ &\textbf{for} \: t=1 \: \textbf{to} \: \ldots \: \textbf{do} \\
+ &\hspace{5mm}g_t \leftarrow \nabla_{\theta} f_t (\theta_{t-1}) \\
+ &\hspace{5mm} \theta_t \leftarrow \theta_{t-1} - \gamma \lambda \theta_{t-1} \\
+ &\hspace{5mm}m_t \leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t \\
+ &\hspace{5mm}v_t \leftarrow \beta_2 v_{t-1} + (1-\beta_2) g^2_t \\
+ &\hspace{5mm}\widehat{m_t} \leftarrow m_t/\big(1-\beta_1^t \big) \\
+ &\hspace{5mm}\widehat{v_t} \leftarrow v_t/\big(1-\beta_2^t \big) \\
+ &\hspace{5mm}\textbf{if} \: amsgrad \\
+ &\hspace{10mm}\widehat{v_t}^{max} \leftarrow \mathrm{max}(\widehat{v_t}^{max},
+ \widehat{v_t}) \\
+ &\hspace{10mm}\theta_t \leftarrow \theta_{t-1} - \gamma \widehat{m_t}/
+ \big(\sqrt{\widehat{v_t}^{max}} + \epsilon \big) \\
+ &\hspace{5mm}\textbf{else} \\
+ &\hspace{10mm}\theta_t \leftarrow \theta_{t-1} - \gamma \widehat{m_t}/
+ \big(\sqrt{\widehat{v_t}} + \epsilon \big) \\
+ &\rule{110mm}{0.4pt} \\[-1.ex]
+ &\bf{return} \: \theta_t \\[-1.ex]
+ &\rule{110mm}{0.4pt} \\[-1.ex]
+ \end{aligned}
+
+ For further details regarding the algorithm we refer to `Decoupled Weight Decay Regularization`_.
Args:
params (iterable): iterable of parameters to optimize or dicts defining
algorithm from the paper `On the Convergence of Adam and Beyond`_
(default: False)
- .. _Adam\: A Method for Stochastic Optimization:
- https://arxiv.org/abs/1412.6980
.. _Decoupled Weight Decay Regularization:
https://arxiv.org/abs/1711.05101
.. _On the Convergence of Adam and Beyond: