Revisiting Locally Supervised Learning an Alternative to End-to-end Training

less than 1 minute read

Published:

e2e learning is better than locally supervised learning, why?

  • hypothesis: input x has two parts: one related to output y, and another part r irrelevant with output target
  • locally supervised learning losses all r and a part of y, which cannot be recovered by later layers.
    • how to varify: by mutual information (an approximation), and linear separability
  • infopro loss: min I(h, x) while max I(h, y)
    • both term should be approximated, with negligible gap with ground truth