Advances in Neural Rendering

4 minute read

Published:

intro

realistic image synthesis

  • CG: require HQ assets, long rendering times, full control of scene param
  • ML: training data, no control of param, automatic, interactive rendering/inference
  • transparency, glossy, thin structures
    • call for best of both worlds
    • DNN ofr image/video generation, explicit/implicit control of scene parameters

neural rendering

  • regression: latent code -> 2D image complex model to learn
  • realistic: mesh/pointcloud + code -> single view -> encdec -> 2D image
  • regress and render: code -> mesh/texture/PC/volume -> CG rendered image
  • sample and blend -> sample points in 3D space -> color and opacity -> CG rendered image

loss function

what is a good L

  • realism
  • correspondence: classification
  • useful for other tasks/data

L2 regression: average result of all possible results

deep learning as a metric

  • loss of latent features
  • how well? agreement with human perception of patch similarity
  • style transfer, segmentation
  • human annotation as GT? costly -> replace with a classifier
    • trained on data
    • add corespondence loss -> paired data required
  • cycle-consistency
    • bi-jections assumed
  • retaining content: crop in src and tgt should be similar while other crops should be far away in the ebedding space
    • infoNCE loss
    • handcrafted data augmentation or synthesized images for positive pairs
    • on different scales
    • no L1 or perceptual loss required

GAN with 3D control

  • update latent variable using scene parameters
    • image to latent variable + params -> new latent variable (StyleGAN as backbone)
    • synthetic datasets: zooming, shifting,
  • w/o supervised pairs: latent -> image -> annotations
  • optimization based methods
  • sorry, the talk is too general to get any detail of the methods, refer to the paper later

neural scene representations and rendering

  • images -> neural scene representation -> neural rendering: apply images
    • self-supervised learning
    • ray marching
  • SRN paramaterizes scene in a MLP
    • manifold assumption: parameters line on some manifold; hypernetwork
  • NeRF: shapenet to real world, overfitting to simple scenes
    • SIREN: overfit to individual signals
      • generalization? pi-GAN, mapping network
    • faster integration by Neuton-Lebnitz formula
    • neural lumigraph rendering
      • learn a shape(SDF), then learn a radiance on the shape surface
      • L1 loss, SDF constraint, mask aggrement, radiance field smoothness(second derivative -> 0 w.r.t. angular changes)
    • ACORN
      • tree based partition of the input domain, each point is only associated to one block
      • for each point, find its block, blockid -> C channel feature grid -> bi/trilinear sample a feature vector -> decode to final output
      • for each block, whether merge, stay, or split to further blocks
        • integer programming

novel view synthesis

  • only note things i do not know

instant 3D capture

  • GeLaTo: fewshot reconstruction w/ pretrained category models
    • model all objects in a certain category
    • neural textures robust to coarse geometry
      • even in thin structures(glasses)
    • few shot reconstruation
  • NeRFies: geometry and appearance of deforming objects
    • casural capture w/o special hw
    • deform ray to a template space, conditioned on time stamp
      • assume regid transformation
      • still under constrainted: elastic regularizer, keep transformation as close to a rotation as possible -> penalize singlar values
      • coarse to fine frequency introduce
      • hypernerf? adding some additional dimention to handle topological changes

learning to relight

  • CNN for relighting: a matrix to select best samples, concat the novel lighting condition in the bottleneck of the Unet
    • tons of papers, seems to have read some of them
  • not that interested in relighting…
    • relightable NeRF
    • brote force: not doable
    • approx visiability from the point to light
      • direct illumination: available during training
      • indirection illumination: whole ray tracing not doable
        • one-brounce from other points from the object, sample random directions

object centric neural scene rendering

  • dynamic scene, w/o retraining
  • read before
  • 7D object centric scattering function
    • position, incoming light, outgoing light
    • path tracing: direct illumination, shadow rays, indirect illumination, primary rays

    NeRF for dynamic scenes

    • novel views in space and time
    • priors over deformation of hidden geometry
      • can condition the ray on time -> geometry and appearance both conditoned -> ray bending, warpping to canonical spaces
        • better to handle small motions, large motion hard to recover in early training
      • bad for topological changes, material and light changes
    • somewhere in between?
      • modeling physics, editability: remove FG, motion exaggerate

some papers read during internship

  • MVP decoder
    • vector -> mesh slab -> deform to a surface, bvh for fast rendering

lookingood

  • render from mesh, rerender(upscaling) using NN, using HD camera as GT
    • produce a predict image and a mask
    • segmentation of the body parts for VR applications
    • L1 recons loss in VGG space, masked
    • head loss: crop and resize
    • temporial loss: loss on temportial differences
    • stereo loss: w/o GT images -> render at a differnect viewpoint adn warp into origin view
    • reweighting: down weight the boundary points, which tend to have a high loss, also down weight the easy-to-reconstruct pixels -> define a threshold min and max
  • ++ better input meshes