Human Object Interaction

2 minute read

Published:

review some paper of human object interaction

toolboxes

  • object bounding box detection
  • instance segmentation
  • human from single image
    • SMPL, SMPL-H, SMPL-X TODO
    • smpl from image
      • keep it smpl, TODO
      • EFT:
      • frankmocap: what is the output in?
  • object from single image
    • neural mesh renderer
    • but all seems to have a object template
      • SMPL-object to deform? overfitting
    • scale constraints from total 3D or web search
  • depth order loss
    • for single images or combining all images(require video input)
    • assume smpl output to be more accurate
    • assume foreground to be in front of the obj, and bg to be behind the obj
  • collision
    • phosa: GPU implementation of some work
    • mover: SDF grid

definition of interaction

  • phosa: overlapping of bbox, predefined
  • mover: human vertices from POSA

phosa

  • acturally quite manural-labeling heavy
    • prior on size of objects
    • template on objects -> SMPL-ish object
  • optimization per image? no
  • weak perspective camera
    • ortho. proj. to plane then proj. to camera
    • so to go back to 3D, assume a fixed focal length for all images
    • note this focal length is also applied to objects, as long as the ratio of scale is correct, everything is good.
    • (x, y, z) -> x * sigma + t_x, y * sigma + t_y. (x, y, z) -> x / z + t_x, y / z + t_y, focal = 1.
      • this is exactly what is done in the code
  • uses SMPL model
    • only 15 joints right, then how is hand modeled?
    • there are plenty works on hand-object interaction right?
    • plus an intrinsic scale of human
      • this can be discarded right? as we are using human as a ruler
  • only optim. w.r.t. intrinsic scales, global rotation, translation of objects
  • interaction loss: distance of centroid
  • ordinal depth loss
  • collision loss: a lot of references, sad

mover

  • contact vertices predicted by POSA
  • contact loss being CD(one or two directional), segmentation of the object

chore and behave

  • why does chore not compare to behave baseline?
    • ok, the input is a image, not PC
    • and from image, there is a lift-to-3D issue, and camera issue
  • 3D representation
  • fit to sdf: looks wrong, but SMPL and obj template offers a strong prior

we still have a meeting in july right?