Learning-Based Image Synthesis

3 minute read

Published: September 24, 2021

some notes from the publically available slides

image as a function

mind the RGB channel order
exposure influences which part of the HDR gets into the picture
basic point processing: enhancement, contrast stretching, histogram equalization
- below notes from the CV book
- aI + b, gain and bias, contrast and brightness
- gamma correction
- histogram equalization: inverse map the RGB distribution to uniform distribution
gaussian before a under sampling, better anti-aliasing

image warping

data-driven graphics: basically some image aligning and some database querying things

CNN for image synthesis

image -> low level feature -> high level feature -> classifier
generate images: hard, L2 does not work(averaging possible solutions)
- for colorization works: add an color distribution loss(cross entropy)
- feature/perceptual loss
  - same as his talk on SIGGRAPH
- GAN losses: JSD, LSGAN, WGAN
- spectral normalization: s.t. the largest eigen value of eachc layer is one
- what drives the progression of GAN? better architecture, better training scheme
conditional image synthesis
- curse of dim: progressive synthesis
- spade
- examplar based synthesis
  - retrieve segments from other images, align to input layout, composit on a canvas(in case of overlap)
content style
- loss: adv loss for style change, cycle loss for perserve of content
- latent space
  - unit: assume two images can be mapped to the same point in some latent space
  - multimodal unit: shared content space, two style spaces
texture synthesis
- efros & leung: sample new pixel given all similar patches
  - choise of neighborhood window
  - slow
  - sample by block, some overlap among neighbor blocks, min error cut
- style loss from Gram matrix

image editing with optimization

face modeling

appearance vector(as in original image) + shape vector(positions of landmarks)
morphable face model: shape + appearance(as in mean-warped image)
- s.t. linear operations of the vectors make sense
eigen faces: nomalize contrast scale and orientation, remove BG, PCA
image registration: alignment from 3D
- face detection, fiducial points detection, pose estimation from template 3D model, 3D alignemnt
image2styleGAN++
- jointly finetuning w and noise n, s.t. only high freq information are in n: update w first, then n
- manipulations on activation
- lots of applications

novel view synthesis: ligit field, volume rendering, refer to all other paper notes

video synthesis
- video as 3D data: gradient guidance, FG+mask+BG
- sequential synthesis
  - read vid2vid, adaptive vid2vid

Human Object Interaction