Gradient Descent Learns One-hidden-layer CNN Don’t be Afraid of Spurious Local Minima

less than 1 minute read

Published: September 17, 2019

previous work

SGD with random init. able to train a one layer NN with ReLU in poly. time
what about two layers?
- this work: w.h.p. GD converges to global min.
  - or a spurious local min.

well…tons of proofs

Human Object Interaction

2 minute read

Published: June 18, 2022

3 minute read

Published: September 24, 2021

5 minute read

Published: September 22, 2021

2 minute read

Published: September 21, 2021