Gradient Descent Learns One-hidden-layer CNN Don’t be Afraid of Spurious Local Minima
Published:
previous work
- SGD with random init. able to train a one layer NN with ReLU in poly. time
- what about two layers?
- this work: w.h.p. GD converges to global min.
- or a spurious local min.
- this work: w.h.p. GD converges to global min.
well…tons of proofs