16-889 Assignment 2: Single View to 3D

Name: Neha Boloor
Andrew ID: nboloor


Goals: In this assignment, you will explore the types of loss and decoder functions for regressing to voxels, point clouds, and mesh representation from single view RGB input.

1. Exploring loss functions

1.1. Fitting a voxel grid (5 points)

Run: python -W ignore main.py --question 1 --mode train --type 'vox'
Optimized Voxel Grid Ground Truth Voxel Grid
q1 0 vox q1 0 vox
q1 0 vox q1 0 vox
q1 0 vox q1 0 vox

1.2. Fitting a point cloud (10 points)

Run: python -W ignore main.py --question 1 --mode train --type 'point'
Optimized Point Cloud Ground Truth Point Cloud
q1 0 vox q1 0 vox
q1 0 vox q1 0 vox
q1 0 vox q1 0 vox

1.3. Fitting a mesh (5 points)

Run: python -W ignore main.py --question 1 --mode train --type 'mesh'
Optimized Mesh Ground Truth Mesh
q1 0 vox q1 0 vox
q1 0 vox q1 0 vox
q1 0 vox q1 0 vox

2. Reconstructing 3D from single view

For training Run: python -W ignore main.py --question 2 --mode train --type 'vox' --save_freq 50 --max_iter 10000 --batch_size 8 --lr 4e-5 --w_smooth 0.15

For evaluating and visualising : python -W ignore main.py --question 2 --mode eval --type 'vox' --vis 100 --load_checkpoint
Input RGB Image Predicted Voxel Ground Truth Mesh
q1 0 vox q1 0 vox q1 0 vox
q1 0 vox q1 0 vox q1 0 vox
q1 0 vox q1 0 vox q1 0 vox

On your webpage, you should include visuals of any three examples in the test set. For each example show the input RGB, render of the predicted 3D voxel grid and a render of the ground truth mesh.

2.2. Image to point cloud (15 points)

For training Run: python -W ignore main.py --question 2 --mode train --type 'point' --save_freq 50 --max_iter 10000 --batch_size 8 --lr 4e-5 --w_smooth 0.15

For evaluating and visualising : python -W ignore main.py --question 2 --mode eval --type 'point' --vis 100 --load_checkpoint
Input RGB Image Predicted Point Cloud Ground Truth Mesh
q1 0 vox q1 0 vox q1 0 vox
q1 0 vox q1 0 vox q1 0 vox
q1 0 vox q1 0 vox q1 0 vox

2.3. Image to mesh (15 points)

For training Run: python -W ignore main.py --question 2 --mode train --type 'mesh' --save_freq 50 --max_iter 10000 --batch_size 8 --lr 4e-5 --w_smooth 0.15

For evaluating and visualising : python -W ignore main.py --question 2 --mode eval --type 'mesh' --vis 100 --load_checkpoint
Input RGB Image Predicted Mesh Ground Truth Mesh
q1 0 vox q1 0 vox q1 0 vox
q1 0 vox q1 0 vox q1 0 vox
q1 0 vox q1 0 vox q1 0 vox

2.4. Quantitative comparisions(10 points)

Metric Voxel Grid Point Cloud Mesh
Avg F1@0.05 81.156 95.516 93.628

2.5. Analyse effects of hyperparms variations (10 points)

I did experiment with a wide range of hyper parameters like learning rate, n_points, batch_size, w_smooth, max iterations etc. Here are a couple of them discussed:
  1. w_smooth: value = 0.1(default), value = 0.15, value = 2 The higher the smoothness value, the smoother the mesh rendered, which results in almost all the predictions looking more or less the same. On the other hand a low value results in variations in predictions but rather pointy meshes. Why is this? Thi sis because mesh predictions use both chamfer distance and laplasian smoothing. A lower value of w_smooth means, higher weightage is given to chamfer loss, which means finding the closest vertex is of greater importance than the surface being smooth, hence a mesh with sharp, pointy protrusions. A value of 0.15 worked reasonably well for me when used with optimal values for other hyperparameters too.
    Smoothness Value Predicted mesh
    0.1 q1 0 vox
    2 q1 0 vox
  2. Batch size for training: value = 2(default) and value = 8 The model trained with batch size = 8, trained better and gave a higher F1 score and more importantly better visualisations. Why is this? I think a larger batch size probably improves the effectiveness of the optimization steps resulting in more rapid convergence of the model parameters and hence better performance for a given number of epochs. Value pof 8 worked best for me when used with optimal values for other hyperparameters too.
    Batch Size Predicted mesh
    2 q1 0 vox
    8 q1 0 vox

2.6. Interpret your model (15 points)

I think visualising both the ground truth and prediction together in a common frame, along with the RGB image input, would give a better idea of what the F1 score is really telling us or what the model is really predicting as opposed to just visualizing them individually. Here, I have rendered the model prediction for point cloud (this can be extended to other 2 representations as well) in "red" and ground truth in "green" and a combined overlap point cloud that shows how well the model predictions are.

I have shown 2 examples. One with the best F1@0.05 score and one with the worst, output by the model I have trained.

The superimposed, overlap visualisation gives us a better idea of how close the predictions actually are to the ground truth and also what parts of the chair did the model predict correctly, and there by get a sense of what patterns of chairs are "easy" for the model to predict accurately and what are "hard".

F1@0.05 RGB Image Predicted Point Cloud Ground Truth Point Cloud Combined Overlap Point Cloud
99.950 q1 0 vox q1 0 vox q1 0 vox q1 0 vox
21.684 q1 0 vox q1 0 vox q1 0 vox q1 0 vox

We could also use this visualisation to see how the mesh (points sampled from here) deform gradually to the final prediction as show here: This is how the initial stages of the training looks like (red points) which finally get deformed and model to get closer to the ground truth representation (green). We could plot this at various time steps (using diffrent checkpoints) during the training for visually appreciating the model's learning process in terms of the output seen.

q1 0 vox

Run: Once you have a checkpoint and want to visualise python -W ignore main.py --question 2 --mode eval --max_iter 100 --type 'point' --visual True --load_checkpoint