Learning-Based Image Synthesis

3 minute read

Published:

some notes from the publically available slides

image as a function

  • mind the RGB channel order
  • exposure influences which part of the HDR gets into the picture
  • basic point processing: enhancement, contrast stretching, histogram equalization
    • below notes from the CV book
    • aI + b, gain and bias, contrast and brightness
    • gamma correction
    • histogram equalization: inverse map the RGB distribution to uniform distribution
  • gaussian before a under sampling, better anti-aliasing

image warping

  • global warping: same for any p, parameterised using few numbers
    • affine transformations: (R, t)
    • projective transformations: full 3*3 matrix
      • origin not necessarily map to origin
      • lines to lines
      • parallel lines not necessarily parallel
    • forward warping: splatting colors
    • inverse warping: interpolation colors
    • morphing: average objects(image of an average object)
      • find the average shape: local warping
      • find average color: cross-dissolve images
      • matting: extract foreground to avoid artifacts in the BG
      • moving least squares: v is a point in the image

data-driven graphics: basically some image aligning and some database querying things

CNN for image synthesis

  • image -> low level feature -> high level feature -> classifier
  • generate images: hard, L2 does not work(averaging possible solutions)
    • for colorization works: add an color distribution loss(cross entropy)
    • feature/perceptual loss
      • same as his talk on SIGGRAPH
    • GAN losses: JSD, LSGAN, WGAN
    • spectral normalization: s.t. the largest eigen value of eachc layer is one
    • what drives the progression of GAN? better architecture, better training scheme
  • conditional image synthesis
    • curse of dim: progressive synthesis
    • spade
    • examplar based synthesis
      • retrieve segments from other images, align to input layout, composit on a canvas(in case of overlap)
  • content style
    • loss: adv loss for style change, cycle loss for perserve of content
    • latent space
      • unit: assume two images can be mapped to the same point in some latent space
      • multimodal unit: shared content space, two style spaces
  • texture synthesis
    • efros & leung: sample new pixel given all similar patches
      • choise of neighborhood window
      • slow
      • sample by block, some overlap among neighbor blocks, min error cut
    • style loss from Gram matrix

image editing with optimization

  • stay close to input, satisfy user constraint, lie on natural image manifold
    • map a latent code(from a nartural image manifold) to image
      • jointly finetuning latent code and generator params works better
    • refer to iGAN
      • learn the manifold using GAN
      • given a real photo, project to manifold
      • each editing is a constrained optimization: finding the new latent s.t. some constraints, w/ regularizer: manifold smoothness
        • color constraints, sketching -> HOG, warping -> warping of constraints
      • motion and color flow objective: min change in color given some spatial transformation and color transformation
  • analyze each neuron: class, position
    • and then more realistic editing

face modeling

  • appearance vector(as in original image) + shape vector(positions of landmarks)
  • morphable face model: shape + appearance(as in mean-warped image)
    • s.t. linear operations of the vectors make sense
  • eigen faces: nomalize contrast scale and orientation, remove BG, PCA
  • image registration: alignment from 3D
    • face detection, fiducial points detection, pose estimation from template 3D model, 3D alignemnt
  • image2styleGAN++
    • jointly finetuning w and noise n, s.t. only high freq information are in n: update w first, then n
    • manipulations on activation
    • lots of applications

novel view synthesis: ligit field, volume rendering, refer to all other paper notes

  • video synthesis
    • video as 3D data: gradient guidance, FG+mask+BG
    • sequential synthesis
      • read vid2vid, adaptive vid2vid