Learning-Based Image Synthesis
Published:
some notes from the publically available slides
image as a function
- mind the RGB channel order
- exposure influences which part of the HDR gets into the picture
- basic point processing: enhancement, contrast stretching, histogram equalization
- below notes from the CV book
- aI + b, gain and bias, contrast and brightness
- gamma correction
- histogram equalization: inverse map the RGB distribution to uniform distribution
- gaussian before a under sampling, better anti-aliasing
image warping
- global warping: same for any p, parameterised using few numbers
- affine transformations: (R, t)
- projective transformations: full 3*3 matrix
- origin not necessarily map to origin
- lines to lines
- parallel lines not necessarily parallel
- forward warping: splatting colors
- inverse warping: interpolation colors
- morphing: average objects(image of an average object)
- find the average shape: local warping
- find average color: cross-dissolve images
- matting: extract foreground to avoid artifacts in the BG
- moving least squares: v is a point in the image
data-driven graphics: basically some image aligning and some database querying things
CNN for image synthesis
- image -> low level feature -> high level feature -> classifier
- generate images: hard, L2 does not work(averaging possible solutions)
- for colorization works: add an color distribution loss(cross entropy)
- feature/perceptual loss
- same as his talk on SIGGRAPH
- GAN losses: JSD, LSGAN, WGAN
- spectral normalization: s.t. the largest eigen value of eachc layer is one
- what drives the progression of GAN? better architecture, better training scheme
- conditional image synthesis
- curse of dim: progressive synthesis
- spade
- examplar based synthesis
- retrieve segments from other images, align to input layout, composit on a canvas(in case of overlap)
- content style
- loss: adv loss for style change, cycle loss for perserve of content
- latent space
- unit: assume two images can be mapped to the same point in some latent space
- multimodal unit: shared content space, two style spaces
- texture synthesis
- efros & leung: sample new pixel given all similar patches
- choise of neighborhood window
- slow
- sample by block, some overlap among neighbor blocks, min error cut
- style loss from Gram matrix
- efros & leung: sample new pixel given all similar patches
image editing with optimization
- stay close to input, satisfy user constraint, lie on natural image manifold
- map a latent code(from a nartural image manifold) to image
- jointly finetuning latent code and generator params works better
- refer to iGAN
- learn the manifold using GAN
- given a real photo, project to manifold
- each editing is a constrained optimization: finding the new latent s.t. some constraints, w/ regularizer: manifold smoothness
- color constraints, sketching -> HOG, warping -> warping of constraints
- motion and color flow objective: min change in color given some spatial transformation and color transformation
- map a latent code(from a nartural image manifold) to image
- analyze each neuron: class, position
- and then more realistic editing
face modeling
- appearance vector(as in original image) + shape vector(positions of landmarks)
- morphable face model: shape + appearance(as in mean-warped image)
- s.t. linear operations of the vectors make sense
- eigen faces: nomalize contrast scale and orientation, remove BG, PCA
- image registration: alignment from 3D
- face detection, fiducial points detection, pose estimation from template 3D model, 3D alignemnt
- image2styleGAN++
- jointly finetuning w and noise n, s.t. only high freq information are in n: update w first, then n
- manipulations on activation
- lots of applications
novel view synthesis: ligit field, volume rendering, refer to all other paper notes
- video synthesis
- video as 3D data: gradient guidance, FG+mask+BG
- sequential synthesis
- read vid2vid, adaptive vid2vid