Assignment 2 of Learning for 3D Vision (16-889)

1. Exploring loss functions

1.1. Fitting a voxel grid (5 points)

Predicted voxel GT voxel

1.2 Fitting a point cloud (10 points)

Predicted Point cloud GT Point cloud

1.3 Fitting a mesh (5 points)

Predicted Mesh GT Mesh

2. Reconstructing 3D from single view

2.1. Image to voxel grid (15 points)

GT RGB GT voxel Predicted voxel

2.2. Image to point cloud (15 points)

GT RGB GT point cloud Predicted point cloud

2.3. Image to mesh (15 points)

GT RGB GT mesh Predicted mesh

2.4. Quantitative comparisions(10 points)

Model Avg F1@0.05
Vox model 73.595
Point cloud model 87.143
Mesh model 77.475

Point cloud model performs the best because its an unconstrained optimization and the model doesn't have to learn about the connectivity between points. Vox model performs the worst because the resolution is quite low to generalize to other examples and the optimization is also quite unbalanced in the sense that number of positive occupied point and unoccupied points varies a lot on different samples. Mesh model performs moderately as its contrained by the connectively of the intial vertices. Model only learns the position of the vertices and hence the initial topology remains the same.

2.5. Analyse effects of hyperparms variations (10 points)

For voxel model, I changed the loss from balanced bce loss to unbalanced one where each positive(occupied) and unoccupied voxels are given the same weight. As suspected this drastically degrades the performs. Without the balance binary cross entropy loss, the performance is F1@0.05: 17.247 whereas with balance binary cross entropy loss its Avg F1@0.05: 73.595. Hence, it makes sense to use the weighted loss for skewed data setting problems.

For mesh model, I tried changing the smoothness parameters. Follwoing are the results : Mesh model smooth 0 : Avg F1@0.05: 78.099 smooth 0.1 : Avg F1@0.05: 77.475 smooth 1 : Avg F1@0.05: 64.916

Hence, we observe that having no smoothness in the optimization can boost a performance a bit but the mesh no longers smooths. With smoothness weightage of 1.0, the optimizations leans towards making the mesh smooth than fitting on the given data.

For point cloud model, I changed the number of points sampled for training (supplied as gt using which loss is calculated). Following are the results: point cloud model num points 1000 : Avg F1@0.05: 72.125 num points 5000 : Avg F1@0.05: 87.143 num points 10000 : Avg F1@0.05: 83.312

We observe that reducing or increasing the number of sampled points hurts the performance of the model.

2.6. Interpret your model (15 points)

To interpret the model, we can use attribution methods to highlight the important regions of the image where the model is focusing. Following results show the attribution maps obtained using integrated methods and we can see that the highlighted regions in the image are usually are the edges of the chair.

Model type GT RGB Integrated gradients attribution
Voxel Model
Point cloud Model
Mesh model Model

3. (Extra Credit) Exploring some recent architectures.

3.1 Implicit network (10 points)

GT RGB GT voxel Predicted voxel

The implicit network performance is Avg F1@0.05: 57.736

3.2 Parametric network (10 points)

GT RGB GT point cloud Predicted point cloud

The parametric network performance is Avg F1@0.05: 63.160

Number of late days 3