16-889: Learning for 3D Vision

Assignment- 2

Aditya Ghuge aghuge

 

 

Late Days Used = 1

1. Exploring loss functions

1.1. Fitting a voxel grid

Ground Truth                                                                                  Prediction

A picture containing logo

Description automatically generatedA picture containing logo

Description automatically generated

 

1.2. Fitting a point cloud

Ground Truth                                                                                                          Prediction

 

A picture containing silhouette, dark, night sky

Description automatically generatedA picture containing silhouette, dark, night sky

Description automatically generated

 

1.3. Fitting a mesh

Ground Truth                                                                      Prediction

 

 

2. Reconstructing 3D from single view

2.1. Image to voxel grid

Image                                                                                    Ground Truth                                                             Prediction

                                                         Logo

Description automatically generated with medium confidence

 

 

                                  A picture containing text, silhouette

Description automatically generated

 

                                  A picture containing aircraft, airplane

Description automatically generated Logo

Description automatically generated

 

 

2.2. Image to point cloud

Image                                                                                    Ground Truth                                                             Prediction

Logo

Description automatically generated                                                                                 

 

 

                                                                             A picture containing silhouette, night sky

Description automatically generated

 

                                                                            A picture containing silhouette

Description automatically generated

 

 

2.3. Image to mesh

Image                                                                                    Ground Truth                                                             Prediction

 

Logo

Description automatically generated                                                          

 

 

A white chair with a black background

Description automatically generated with low confidence                                                                             A picture containing aircraft, silhouette, airplane

Description automatically generated

 

 

Icon

Description automatically generated                                                                           

2.4. Quantitative comparison

Representation

Avg. F1 Score

Voxels

85.953

Point Cloud

94.171

Mesh

92.259

Batch size is 8

Explaination

Based on the above table average F1 score is maximum (approx. 94%) compared to other two representation. In representation of point clouds we have liberty to predict points randomly as there is no co-dependency between points which can constrain the model to predict. In mesh we initialize the mesh with a sphere, which does restrict us with creating holes in the prediction as the mesh vertices are constrained due to connections(faces). It cannot take arbitrary positions. In voxels the main constrain is the voxel cube which we select as it will decide the granularity of the predictions. Also, as number of output values in point cloud are less than voxels it gets learned better and quicker. Training voxels was a challenging task as output size is 32*32*32. I may have reached suboptimal parameters, but with better training it can beat Mesh as we can generate holes in voxels.

2.5. Analyze effects of hyperparms variation

Analyse the results, by varying an hyperparameter of your choice. For example n_points or vox_size or w_chamfer or initial mesh(ico_sphere) etc. Try to be unique and conclusive in your analysis.

First analysis Change in n_points 1000, 5000, 7000 with batch size of 4

Num_Points

Avg. F1 Score

1000

86.628

5000

92.381

7000

93.146

 

number of Points: 1000

A picture containing silhouette, night sky

Description automatically generatedA picture containing night sky

Description automatically generated

 

number of Points: 5000

A picture containing silhouette

Description automatically generated

 

number of Points 7000

A picture containing dark, silhouette, night sky

Description automatically generatedA picture containing silhouette

Description automatically generated

 

As we can clearly see increasing number of points increases average F1 score: as we increase number of points, more points get correctly predicted and as a result we get high F1 score.

Also effect of batch size as we increase the batch size our average F1 score increases as evident for num_points 5000

 

Seconds analysis Change in initial mesh ico_sphere 3 , 5  batch size of 4

Ico_sphere

Avg. F1 Score

3

86.901

5

90.135

 

Ico_sphere = 3

A picture containing bat, aircraft, airplane, silhouette

Description automatically generated

Ico_sphere = 5

A black leaf on a white background

Description automatically generated with medium confidenceA picture containing outdoor object

Description automatically generated

As we can clearly see increasing number of vertices in mesh increases average F1 score: as we increase number of vertices, the mesh can more effectively predicted the model and as a result we get high F1 score.

 

 

2.6. Interpret your model 

To interpret the model, I have taken second last layer of my decoder model (by adding hook) as output features for an image. Then I have computed its nearest neighbours so as to find the features generated by the models are similar for similar looking images of chair. i.e., A sofa should be closely linked to square shaped chair/objects. I have computed 4 nearest neighbors of each image to get an idea of the model’s performance. Below are some images and its nearest neighbor’s images.

Image                                                                                    Neighbours

                                         A picture containing toilet, seat

Description automatically generated                           

 

 

Logo

Description automatically generated                                                                                  

 

                                                      Icon

Description automatically generated             

 

As you can see similar looking objects are classified as nearest neighbors showing good performance of model.

 

Another interpretation would be just looking visually are the gif to see the performance of the model

Below are the comparison between n_points 5000 and 7000 for point model

number of Points: 5000

A picture containing silhouette

Description automatically generated

 

number of Points 7000

A picture containing dark, silhouette, night sky

Description automatically generatedA picture containing silhouette

Description automatically generated

 

As evident num_points model outputs look more visually correct.

 

3. (Extra Credit) Exploring some recent architectures.

3.1 Implicit network

Code uploaded

 

3.2 Parametric network

Code uploaded. Just the model is defined.