16-889: Learning for 3D Vision
Assignment- 2
Aditya Ghuge aghuge
Late Days Used = 1
1. Exploring loss functions
Ground
Truth Prediction
Ground
Truth Prediction
Ground
Truth Prediction
Image Ground
Truth Prediction
Image Ground
Truth Prediction
Image Ground
Truth Prediction
Representation |
Avg. F1 Score |
Voxels |
85.953 |
Point Cloud |
94.171 |
Mesh |
92.259 |
Batch size is 8
Explaination
Based on the
above table average F1 score is maximum (approx. 94%) compared to other two
representation. In representation of point clouds we
have liberty to predict points randomly as there is no co-dependency between
points which can constrain the model to predict. In mesh we initialize the mesh
with a sphere, which does restrict us with creating holes in the prediction as
the mesh vertices are constrained due to connections(faces). It cannot take
arbitrary positions. In voxels the main constrain is the voxel cube which we
select as it will decide the granularity of the predictions. Also, as number of
output values in point cloud are less than voxels it gets learned better and
quicker. Training voxels was a challenging task as output size is 32*32*32. I
may have reached suboptimal parameters, but with better training it can beat
Mesh as we can generate holes in voxels.
Analyse the results, by varying
an hyperparameter of your choice. For example n_points
or vox_size
or w_chamfer
or initial
mesh(ico_sphere)
etc. Try to be
unique and conclusive in your analysis.
First analysis Change in n_points
1000, 5000, 7000 with batch size of 4
Num_Points |
Avg. F1 Score |
1000 |
86.628 |
5000 |
92.381 |
7000 |
93.146 |
number of Points: 1000
number of Points: 5000
number of Points 7000
As we can clearly see increasing number of points increases average F1 score: as we increase number of points, more points get correctly predicted and as a result we get high F1 score.
Also effect of batch size as we increase the batch size our average F1 score increases as evident for num_points 5000
Seconds analysis Change in initial mesh ico_sphere
3 , 5 batch
size of 4
Ico_sphere |
Avg. F1 Score |
3 |
86.901 |
5 |
90.135 |
Ico_sphere = 3
Ico_sphere = 5
As we can clearly see increasing number of vertices in mesh increases average F1 score: as we increase number of vertices, the mesh can more effectively predicted the model and as a result we get high F1 score.
To interpret the model, I have taken second last layer of my decoder model (by adding hook) as output features for an image. Then I have computed its nearest neighbours so as to find the features generated by the models are similar for similar looking images of chair. i.e., A sofa should be closely linked to square shaped chair/objects. I have computed 4 nearest neighbors of each image to get an idea of the model’s performance. Below are some images and its nearest neighbor’s images.
Image Neighbours
As you can see similar looking objects are classified as nearest neighbors showing good performance of model.
Another interpretation would be just looking visually are the gif to see the performance of the model
Below are the comparison between n_points 5000 and 7000 for point model
number of Points: 5000
number of Points 7000
As evident num_points model outputs look more visually correct.
Code uploaded
Code uploaded. Just the model is defined.