Learning to learn by gradient descent by gradient descent

less than 1 minute read

Published:

hand designed optimization to learned ones?

  • no free lunch: optimize in a specified field
  • rethinking generalization

optimizer and optimizee

  • L(phi) = E_f[f(theta(f, phi))]
    • a good phi should max. theta’s performance w.r.t. target function f
    • sample f to update phi
  • refer to code