Learning to learn by gradient descent by gradient descent
hand designed optimization to learned ones?
- no free lunch: optimize in a specified field
- rethinking generalization
optimizer and optimizee
- L(phi) = E_f[f(theta(f, phi))]
- a good phi should max. theta’s performance w.r.t. target function f
- sample f to update phi
- refer to code