Advances in Neural Rendering
Published:
intro
realistic image synthesis
- CG: require HQ assets, long rendering times, full control of scene param
- ML: training data, no control of param, automatic, interactive rendering/inference
- transparency, glossy, thin structures
- call for best of both worlds
- DNN ofr image/video generation, explicit/implicit control of scene parameters
neural rendering
- regression: latent code -> 2D image complex model to learn
- realistic: mesh/pointcloud + code -> single view -> encdec -> 2D image
- regress and render: code -> mesh/texture/PC/volume -> CG rendered image
- sample and blend -> sample points in 3D space -> color and opacity -> CG rendered image
loss function
what is a good L
- realism
- correspondence: classification
- useful for other tasks/data
L2 regression: average result of all possible results
deep learning as a metric
- loss of latent features
- how well? agreement with human perception of patch similarity
- style transfer, segmentation
- human annotation as GT? costly -> replace with a classifier
- trained on data
- add corespondence loss -> paired data required
- cycle-consistency
- bi-jections assumed
- retaining content: crop in src and tgt should be similar while other crops should be far away in the ebedding space
- infoNCE loss
- handcrafted data augmentation or synthesized images for positive pairs
- on different scales
- no L1 or perceptual loss required
GAN with 3D control
- update latent variable using scene parameters
- image to latent variable + params -> new latent variable (StyleGAN as backbone)
- synthetic datasets: zooming, shifting,
- w/o supervised pairs: latent -> image -> annotations
- optimization based methods
- sorry, the talk is too general to get any detail of the methods, refer to the paper later
neural scene representations and rendering
- images -> neural scene representation -> neural rendering: apply images
- self-supervised learning
- ray marching
- SRN paramaterizes scene in a MLP
- manifold assumption: parameters line on some manifold; hypernetwork
- NeRF: shapenet to real world, overfitting to simple scenes
- SIREN: overfit to individual signals
- generalization? pi-GAN, mapping network
- faster integration by Neuton-Lebnitz formula
- neural lumigraph rendering
- learn a shape(SDF), then learn a radiance on the shape surface
- L1 loss, SDF constraint, mask aggrement, radiance field smoothness(second derivative -> 0 w.r.t. angular changes)
- ACORN
- tree based partition of the input domain, each point is only associated to one block
- for each point, find its block, blockid -> C channel feature grid -> bi/trilinear sample a feature vector -> decode to final output
- for each block, whether merge, stay, or split to further blocks
- integer programming
- SIREN: overfit to individual signals
novel view synthesis
- only note things i do not know
instant 3D capture
- GeLaTo: fewshot reconstruction w/ pretrained category models
- model all objects in a certain category
- neural textures robust to coarse geometry
- even in thin structures(glasses)
- few shot reconstruation
- NeRFies: geometry and appearance of deforming objects
- casural capture w/o special hw
- deform ray to a template space, conditioned on time stamp
- assume regid transformation
- still under constrainted: elastic regularizer, keep transformation as close to a rotation as possible -> penalize singlar values
- coarse to fine frequency introduce
- hypernerf? adding some additional dimention to handle topological changes
learning to relight
- CNN for relighting: a matrix to select best samples, concat the novel lighting condition in the bottleneck of the Unet
- tons of papers, seems to have read some of them
- not that interested in relighting…
- relightable NeRF
- brote force: not doable
- approx visiability from the point to light
- direct illumination: available during training
- indirection illumination: whole ray tracing not doable
- one-brounce from other points from the object, sample random directions
object centric neural scene rendering
- dynamic scene, w/o retraining
- read before
- 7D object centric scattering function
- position, incoming light, outgoing light
- path tracing: direct illumination, shadow rays, indirect illumination, primary rays
NeRF for dynamic scenes
- novel views in space and time
- priors over deformation of hidden geometry
- can condition the ray on time -> geometry and appearance both conditoned -> ray bending, warpping to canonical spaces
- better to handle small motions, large motion hard to recover in early training
- bad for topological changes, material and light changes
- can condition the ray on time -> geometry and appearance both conditoned -> ray bending, warpping to canonical spaces
- somewhere in between?
- modeling physics, editability: remove FG, motion exaggerate
some papers read during internship
- MVP decoder
- vector -> mesh slab -> deform to a surface, bvh for fast rendering
lookingood
- render from mesh, rerender(upscaling) using NN, using HD camera as GT
- produce a predict image and a mask
- segmentation of the body parts for VR applications
- L1 recons loss in VGG space, masked
- head loss: crop and resize
- temporial loss: loss on temportial differences
- stereo loss: w/o GT images -> render at a differnect viewpoint adn warp into origin view
- reweighting: down weight the boundary points, which tend to have a high loss, also down weight the easy-to-reconstruct pixels -> define a threshold min and max
- ++ better input meshes