16889 Assignment 2: Single View to 3D¶

Presented: Dijing Zhang

1. Exploring loss functions¶

1. 1 Fitting a voxel grid¶

source voxel

q1.1_src.gif

target voxel

q1.1_tgt.gif

1.2 Fitting a point cloud¶

source point cloud

q1.2_src.gif

target point cloud

q1.2_tgt.gif

1.3 Fitting a mesh¶

Source Mesh

q1.3_src.gif

Target Mesh

q1.3_tgt.gif

2. Reconstructing 3D from single view¶

2.1 Image to voxel grid¶

Example1 - RGB

Example1 - GT

Example1 - Pred

Example2 - RGB

Example2 - GT

Example2 - Pred

Example3 - RGB

Example3 - GT

Example3 - Pred

2.2 Image to point cloud¶

Example1 - RGB

Example1 - GT

Example1 - Pred

Example2 - RGB

Example2 - GT

Example2 - Pred

Example3 - RGB

Example3 - GT

Example3 - Pred

2.3. Image to mesh¶

Example1 - RGB

Example1 - GT

Example1 - Pred

Example2 - RGB

Example2 - GT

Example2 - Pred

Example3 - RGB

Example3 - GT

Example3 - Pred

2.4 Quantitative comparisions¶

F1 @ 0.05

Voxel: 84.2379

Mesh: 71.2455

Point: 57.9918

As we can see here, the voxel has the greatest F1 @ 0.05 because I design a much more complex model for voxel prediction while point-model and mesh-model are much simpler, almost composed of linear layers.

Besides, the metric for point is lower than the others. It is becuase of the discrete format of point cloud while voxel and mesh has instrinic connection

2.5 Analyse effects of hyperparms variations¶

w_smooth: By changing the hyperparameters, we change the weight for mesh smoothness, which can help to get a more smooth mesh result if we increase the value of w_smooth. And it will be really helpful to increase the metric. But if set too large weight, we will get a abstract shape, like every chair will show the same shape.
n_point: By changing the hyperparameters, we change the sampled point for point cloud. Typically, we need at least 1024 points to show the shape and with the increase of n_point, we can have a much more accurate presentation but less accurate because we need to predict more points. The F1-score will decrease as n_point increases.
batch_size: Batch_size plays a great role in convergence of model. Default value is too small and at least 32 can help model to converge and not cause overfitting.
arch: the default uses resnet18. We can change it to resnet50 or even larger one, like resnet101 to enhance the feature extraction. It will greatly increase the performance.

2.6 Interpret your model¶

1. Show voxel with probability color

The higher probability of each cell will appear as darker color. And as we can see, the main part of the chair tends to have darker color and the trivial part tends to not.

$0_vox_src.gif$

2. Transformer of meshes!

Insight into how model gradually predict a chair!