VQA Visual Question Answering

less than 1 minute read

Published:

what is “AI-complete”?

  • require multi-modal knowledge, beyond a single domain
  • well-defined quantitative evaluation metric

provide a dataset

  • images from MSCOCO
  • 3 questions per image
    • open end, free form
    • low-level and commonsense level
    • are images necessary / too strong commonsense?
  • network
    • image -> hidden layer
    • question -> lstm / bow
    • then concat these layers, fc