An overview of gradient descent optimization algorithms

less than 1 minute read

Published:

refer to lecture notes and slides.

adamax

  • generalize the l_2 regularizer to l_infty

in practice, use adam