Gradient Descent Learns One-hidden-layer CNN Don’t be Afraid of Spurious Local Minima

less than 1 minute read

Published:

previous work

  • SGD with random init. able to train a one layer NN with ReLU in poly. time
  • what about two layers?
    • this work: w.h.p. GD converges to global min.
      • or a spurious local min.

well…tons of proofs