GRADIENT DESCENT PROVABLY OPTIMIZES OVER-PARAMETERIZED NEURAL NETWORKS less than 1 minute read Published: September 17, 2019this work: two layer fc + ReLU able to achieve a globally optimal at linear rate using GDobservation: zero error on random labelproof…Share on Twitter Facebook LinkedIn Previous Next