1. Exploring loss functions

1.1. Fitting a voxel grid (5 points)

1.2. Fitting a point cloud (10 points)

1.3. Fitting a mesh (5 points)

2. Reconstructing 3D from single view

2.1 - 2.3

2.4 Quantitative comparisions

	F1@0.05
Mesh	84.38
Voxel	68.37
Point Cloud	92.71

Intuitive explanation:

The F1 Score metric is based upon L1 and L2 distances of points sampled from the 3d object representation.

Clearly, a network predicting point clouds would score well in this regard, as the model was trained to predict points close to the GT point cloud without respecting any mesh constraints. Thus we see our F1 score on point clouds to be very high.

In the case of meshes, we are attempting to predict not just arbitrary points, but vertex offset corresponding to faces which impose their own constraints. Thus points sampled from a predicted mesh needn’t be necessarily as close as a network optimized to predict point clouds. Hence, we see a drop in the F1 score.

Lastly, the network trained to predict voxels performed the worst because of the inherent difficulty in predicting the occupancy of 32768 voxels.

2.5. Analyse effects of hyperparms variations

ICO	F1@0.05
2	82.15
3	84.38
4	85.45

GT	ICO=2	ICO=3	ICO=4

Analysis and Conclusion:

We trained the models to predict meshes with the same architecture and schedule, however, we changed the ICO parameter to understand the effect of assuming a detailed vs a simpler base mesh. As we can see, both the ICO parameters of 2 and 4 have different kind of issues, ICO=2 is under-representative and is not able to represent the finer curves. Whereas in ICO=4, the predicted shape has a lot of jagged edged and jitters implying that it’s not able to adequately learn the inticracies of the shape. ICO=3 looks the most natural.

The crucial insight for me from this experiment was the metrics are imperfect in nature. Even though ICO=4 results in a higher F1 score than ICO=3, clearly the results of the latter are smoother.

2.6. Interpret your model

I wished to understand the kind of samples my models very doing drastically worse or better than each other, so I computed disagreement abs(F1{m1} - F1{m2}) of all the samples and visualized the ones with the least agreement. Following are the results:

Point Cloud Model Vs Mesh Model (Least Agreement in F1)

PTS: 42.033 MESH: 83.741
PTS: 94.058 MESH: 52.04
PTS: 36.978 MESH: 81.336
PTS: 24.827 MESH: 72.591
PTS: 15.337 MESH: 63.26

Voxel Model Vs Mesh Model (Least Agreement in F1)

VOX: 43.571 MESH: 89.343
VOX: 52.559 MESH: 98.35
VOX: 34.649 MESH: 82.924
VOX: 41.473 MESH: 90.745
VOX: 32.518 MESH: 83.347

Points Cloud Model Vs Voxel Model (Least Agreement in F1)

PTS: 95.617 VOXEL: 41.473
PTS: 14.511 VOXEL: 69.682
PTS: 92.932 VOXEL: 34.649
PTS: 92.033 VOXEL: 32.644
PTS: 93.516 VOXEL: 32.518

Insights:

The F1 metric is very imperfect. 3D objects predicted by the Voxel models are penalized heavily even if they have similar shape to the original ground truth. The metric is very lenient to the point clouds which may even not demonstrate shape similarity, but still recieves very high scores.
Lots of holes and thin parts in shapes are really hard for all the models. Mesh model is the best performing one in this case, possibly because we model the relationship between vertices and faces, which leads to better modelling of such thin parts. Due to the low resolution of the voxels, the voxels model is specially bad at modelling such shapes.
It appears that point cloud may have the best “numerical” performance, however, it’s not able to capture the shape and resorts to a template chair shape when it can’t model the details (which gives it reasonably good “numbers” too).