Advances in Neural Rendering

4 minute read

Published: September 22, 2021

realistic image synthesis

CG: require HQ assets, long rendering times, full control of scene param
ML: training data, no control of param, automatic, interactive rendering/inference
transparency, glossy, thin structures
- call for best of both worlds
- DNN ofr image/video generation, explicit/implicit control of scene parameters

neural rendering

regression: latent code -> 2D image complex model to learn
realistic: mesh/pointcloud + code -> single view -> encdec -> 2D image
regress and render: code -> mesh/texture/PC/volume -> CG rendered image
sample and blend -> sample points in 3D space -> color and opacity -> CG rendered image

what is a good L

L2 regression: average result of all possible results

deep learning as a metric

loss of latent features
how well? agreement with human perception of patch similarity
style transfer, segmentation
human annotation as GT? costly -> replace with a classifier
- trained on data
- add corespondence loss -> paired data required
cycle-consistency
- bi-jections assumed
retaining content: crop in src and tgt should be similar while other crops should be far away in the ebedding space
- infoNCE loss
- handcrafted data augmentation or synthesized images for positive pairs
- on different scales
- no L1 or perceptual loss required

GAN with 3D control

update latent variable using scene parameters
- image to latent variable + params -> new latent variable (StyleGAN as backbone)
- synthetic datasets: zooming, shifting,
w/o supervised pairs: latent -> image -> annotations
optimization based methods
sorry, the talk is too general to get any detail of the methods, refer to the paper later

neural scene representations and rendering

images -> neural scene representation -> neural rendering: apply images
- self-supervised learning
- ray marching
SRN paramaterizes scene in a MLP
- manifold assumption: parameters line on some manifold; hypernetwork
NeRF: shapenet to real world, overfitting to simple scenes
- SIREN: overfit to individual signals
  - generalization? pi-GAN, mapping network
- faster integration by Neuton-Lebnitz formula
- neural lumigraph rendering
  - learn a shape(SDF), then learn a radiance on the shape surface
  - L1 loss, SDF constraint, mask aggrement, radiance field smoothness(second derivative -> 0 w.r.t. angular changes)
- ACORN
  - tree based partition of the input domain, each point is only associated to one block
  - for each point, find its block, blockid -> C channel feature grid -> bi/trilinear sample a feature vector -> decode to final output
  - for each block, whether merge, stay, or split to further blocks
    - integer programming

novel view synthesis

instant 3D capture

GeLaTo: fewshot reconstruction w/ pretrained category models
- model all objects in a certain category
- neural textures robust to coarse geometry
  - even in thin structures(glasses)
- few shot reconstruation
NeRFies: geometry and appearance of deforming objects
- casural capture w/o special hw
- deform ray to a template space, conditioned on time stamp
  - assume regid transformation
  - still under constrainted: elastic regularizer, keep transformation as close to a rotation as possible -> penalize singlar values
  - coarse to fine frequency introduce
  - hypernerf? adding some additional dimention to handle topological changes

learning to relight

CNN for relighting: a matrix to select best samples, concat the novel lighting condition in the bottleneck of the Unet
- tons of papers, seems to have read some of them
not that interested in relighting…
- relightable NeRF
- brote force: not doable
- approx visiability from the point to light
  - direct illumination: available during training
  - indirection illumination: whole ray tracing not doable
    - one-brounce from other points from the object, sample random directions

object centric neural scene rendering

some papers read during internship

MVP decoder
- vector -> mesh slab -> deform to a surface, bvh for fast rendering

lookingood

Human Object Interaction