Learning for 3D Vision: Assignement 2
Deepti Upmaka
1. Exploring loss functions
1.1. Fitting a voxel grid (5 points)
Target Image ______________________________________ Source Image
1.2. Fitting a point cloud (10 points)
Target Image ______________________________________ Source Image
1.3. Fitting a mesh (5 points)
Target Image ______________________________________ Source Image
2. Reconstructing 3D from a single view
2.1. Image to voxel grid (15 points)
RGB Image _____________________ Ground Truth _____________________ Predicted Voxel
RGB Image _____________________ Ground Truth _____________________ Predicted Voxel
RGB Image _____________________ Ground Truth _____________________ Predicted Voxel
2.2. Image to point cloud (15 points)
RGB Image _____________________ Ground Truth _____________________ Predicted Point Cloud
RGB Image _____________________ Ground Truth _____________________ Predicted Point Cloud
RGB Image _____________________ Ground Truth _____________________ Predicted Point Cloud
2.3. Image to mesh (15 points)
RGB Image _____________________ Ground Truth _____________________ Predicted Mesh
RGB Image _____________________ Ground Truth _____________________ Predicted Mesh
RGB Image _____________________ Ground Truth _____________________ Predicted Mesh
2.4. Quantitative comparisions(10 points)
The average F1 score for voxel is: Avg F1@0.05: 64.011
The average F1 score for point cloud is: Avg F1@0.05: 91.574
The average F1 score for mesh is: Avg F1@0.05: 83.373
Mesh and point cloud are similar because they are effectively learning displacement from the initalization. They both use the chamfer loss to deform the mesh vertices and the point cloud respectively. F1 score measures the combination of the precision and recall used to compare performace. It would make sense for the F1 score for point cloud to be the highest because it measures the distance to the closest point which might not necessarily be the more "correct" point. But in the case of mesh, while it is still trying to find the offsets, it needs to deform to maintain the connectivity information of the faces. Voxel on the other hand measures the occupancy of the cubes. We also smooth the voxel when visualizing but calculate the error on the original mesh which might have protrutions that aren't reflected in the ground truth. For this reason it has the lowest F1 score.
2.5. Analyse effects of hyperparms variations (10 points)
The hyperparameter I chose to change was the w_smooth parameter. The loss for mesh is calculated using the laplacian smoothing loss and the chamfer loss. When it is a small number like 0.1 it preserves much of the protrusions of the chair since it gives more importance to chamfer loss and finding the closest vertex. The laplacian smoothing loss tries to average the nodes that are not consistent with the neighboring points. By increasing the w_smooth to a larger number like 7, it is weighting the smoothness of the mesh higher meaning that the mesh will lose some of the fine details.
Below you can see examples when w_smooth = 0.1 and w_smooth = 7.
2.6. Interpret your model (15 points)
In order to interepret what the learned model is doing, it can be helpful to see the progress of the training. This means we track one image throughout the training process and see how it evoles from the initalization to the final prediction. Using the trained model, I predicted the point cloud, mesh and voxel in that current iteration. Visualizing this image every 100 iterations or so is helpful in seeing how the model learns to deform the points. Below you can see an examples of the point cloud, mesh and voxel training at different stages during the first 3000 iterations.
