VQA Visual Question Answering

less than 1 minute read

Published: August 22, 2019

what is “AI-complete”?

require multi-modal knowledge, beyond a single domain
well-defined quantitative evaluation metric

provide a dataset

images from MSCOCO
3 questions per image
- open end, free form
- low-level and commonsense level
- are images necessary / too strong commonsense?
network
- image -> hidden layer
- question -> lstm / bow
- then concat these layers, fc

Share on

Twitter Facebook LinkedIn

You May Also Enjoy

Human Object Interaction

2 minute read

Published: June 18, 2022

Learning-Based Image Synthesis

3 minute read

Published: September 24, 2021

Advances in Neural Rendering

5 minute read

Published: September 22, 2021

Self Attention for Computer Vision

2 minute read

Published: September 21, 2021