Multi-view 3D Reconstruction with Transformer
Published:
3D reconstruction as a seq2seq prediction
- image -> pretrained CNN -> feature -> transformer -> 3D QK, 3D V from learnable 3D volumee -> 3D transformer -> 3D volumes, grouped to a single 3D output
- 30% parmeters than CNN