16889 Assignment 2: Single View to 3D¶

Presented: Dijing Zhang

1. Exploring loss functions¶

1. 1 Fitting a voxel grid¶

source voxel

q1.1_src.gif

target voxel

q1.1_tgt.gif

1.2 Fitting a point cloud¶

source point cloud

q1.2_src.gif

target point cloud

q1.2_tgt.gif

1.3 Fitting a mesh¶

Source Mesh

q1.3_src.gif

Target Mesh

q1.3_tgt.gif

2. Reconstructing 3D from single view¶

2.1 Image to voxel grid¶

Example1 - RGB

340_vox.png

Example1 - GT

340_vox_gt.gif

Example1 - Pred

340_vox_src.gif

Example2 - RGB

280_vox.png

Example2 - GT

280_vox_gt.gif

Example2 - Pred

280_vox_src.gif

Example3 - RGB

140_vox.png

Example3 - GT

140_vox_gt.gif

Example3 - Pred

140_vox_src.gif

2.2 Image to point cloud¶

Example1 - RGB

0_point.png

Example1 - GT

0_point_gt.gif

Example1 - Pred

0_point_src.gif

Example2 - RGB

100_point.png

Example2 - GT

100_point_gt.gif

Example2 - Pred

100_point_src.gif

Example3 - RGB

600_point.png

Example3 - GT

600_point_gt.gif

Example3 - Pred

600_point_src.gif

2.3. Image to mesh¶

Example1 - RGB

240_mesh.png

Example1 - GT

240_mesh_gt.gif

Example1 - Pred

240_mesh_src.gif

Example2 - RGB

300_mesh.png

Example2 - GT

300_mesh_gt.gif

Example2 - Pred

300_mesh_src.gif

Example3 - RGB

460_mesh.png

Example3 - GT

460_mesh_gt.gif

Example3 - Pred

460_mesh_src.gif

2.4 Quantitative comparisions¶

F1 @ 0.05

Voxel: 84.2379

Mesh: 71.2455

Point: 57.9918

As we can see here, the voxel has the greatest F1 @ 0.05 because I design a much more complex model for voxel prediction while point-model and mesh-model are much simpler, almost composed of linear layers.

Besides, the metric for point is lower than the others. It is becuase of the discrete format of point cloud while voxel and mesh has instrinic connection

2.5 Analyse effects of hyperparms variations¶

  1. w_smooth: By changing the hyperparameters, we change the weight for mesh smoothness, which can help to get a more smooth mesh result if we increase the value of w_smooth. And it will be really helpful to increase the metric. But if set too large weight, we will get a abstract shape, like every chair will show the same shape.

  2. n_point: By changing the hyperparameters, we change the sampled point for point cloud. Typically, we need at least 1024 points to show the shape and with the increase of n_point, we can have a much more accurate presentation but less accurate because we need to predict more points. The F1-score will decrease as n_point increases.

  3. batch_size: Batch_size plays a great role in convergence of model. Default value is too small and at least 32 can help model to converge and not cause overfitting.

  4. arch: the default uses resnet18. We can change it to resnet50 or even larger one, like resnet101 to enhance the feature extraction. It will greatly increase the performance.

2.6 Interpret your model¶

1. Show voxel with probability color

The higher probability of each cell will appear as darker color. And as we can see, the main part of the chair tends to have darker color and the trivial part tends to not.

300_vox_src.gif

0_vox_src.gif

2. Transformer of meshes!

Insight into how model gradually predict a chair!

1.gif