Multi-view 3D Reconstruction with Transformer

less than 1 minute read

Published:

3D reconstruction as a seq2seq prediction

  • image -> pretrained CNN -> feature -> transformer -> 3D QK, 3D V from learnable 3D volumee -> 3D transformer -> 3D volumes, grouped to a single 3D output
  • 30% parmeters than CNN