Daniel Bronstein

Assignment #2

3D Learning

1.1

Target

Result

1.2

Target

Result

1.3

Target

Result

2.1

2.2

2.3

2.4

Voxel F1 at 0.05 threshold: 34.3%

Point F1 at 0.05 threshold: 88.2%

Mesh F1 at 0.05 threshold: 81.5%

The pointcloud reconstruction has the highest F1 score, slightly above mesh reconstruction and both significantly more than the voxel reconstruction. This heirachy makes sense for models that output similar qualitativ results (granted, of different variety). This is because the chamfer loss used to guide pointcloud reconstruction almost exactly reduces the F1 score through it's objective. Thus, it is reasonable that the point cloud F1 score is the highest.

Mesh similarly uses the chamfer loss, although is further guided by a smoothing loss on the surface representation. This additional constraint on the loss explains the slightly deteriorated loss relative to point cloud reconstruction. Additionally, the meshes are reproduced through the deformation of a sphere. This the mesh is not able to accurately reconstruct shapes other than genus 0, which contributes to error in the reconstruction.

Finally, the voxel F1 score has a significantly lower F1 score. This can be ascribed to the relatively coarse nature of the voxel representation relative to the reconstruction domain. Specifically, voxels are predicted at 32x32x32 over a scene in the range [-1,1]^3. This means the edge of the voxels are 0.0625, and the corresponding pointclouds sampled from this voxel mesh are approximately as course (minus smoothing from mcube) . Since the F1 score is being calculated at a threshold distance of 0.05, it is reasonable that the voxel reconstruction performs poorly under this metric, since it is only capable of expressing shape at a coarser resolution.

2.5

During training of the mesh, I tried several different magnitudes of w_smooth. For very low values relative to w_chamfer, the mesh would look severely distorted, self-intersecting and pointy. This is expected, since the loss only drives randomly sampled points on the surface of the mesh to be close to the surface of the ground truth mesh, without any concern for the consistency of the underlying predicted mesh. Increasing w_smooth too much resulted in a practically uniform output resembling a bloated chair that was clearly oversmoothed and retained no detail. Again this is consistent with the tradeoff expected between these two weights. The final selection of w_smooth was decided through trial and error, looking for a combination that retained the features of the input chair while having an approximately smooth mesh.

2.6

I visualized the voxel reconstruction of a single sample over the course of training. This is from a random initialization up to 600 epochs at batch size 32.

Ground Truth vs Reconstruction over 19200 training samples