16-889 Assignment 2: Single View to 3D

1. Exploring loss functions

1.1. Fitting a voxel grid (5 points)

I visualize the optimized voxel grid (left) along-side the ground truth voxel (right) grid as below:

1.2. Fitting a point cloud (10 points)

I visualize the optimized point cloud (left) along-side the ground truth point cloud (right) as below:

1.3. Fitting a mesh (5 points)

I visualize the optimized mesh (left) along-side the ground truth mesh (right) as below:

2. Reconstructing 3D from single view

2.1. Image to voxel grid (15 points)

The input RGB (left), render of the predicted 3D voxel grid (middle) and a render of the ground truth mesh (right) are as below:

2.2. Image to point grid (15 points)

The input RGB (left), render of the predicted 3D point cloud (middle) and a render of the ground truth mesh (right) are as below:

 

2.3. Image to mesh (15 points)

The input RGB (left), render of the predicted mesh (middle) and a render of the ground truth mesh (right) are as below:

2.4. Quantitative comparisions(10 points)

The average test F1 score at 0.05 threshold for voxelgrid, pointcloud and the mesh network are 75.346, 96.764 and 95.746, respectively. It seems that pointcloud and mesh are more easily to learn since it only need to learn points included in the pointcloud and for mesh, we have assuming initial mesh and also only need to learn vertice positions. But for voxel grid, we need to predict occupancy value for each voxel. So it may be more difficult than the other two tasks and that's maybe why it has a lower test F1 score.

2.5. Analyse effects of hyperparms variations (10 points)

I did two experiments on n_points and w_chamfer

I set n_points as 5000, 500 and 50 and the average test F1 score at 0.05 threshold for pointcloud are 96.764, 89.294 and 39.160, respectively.

The input RGB (left), and a render of the ground truth mesh (right) are as below:

The render of the predicted 3D point cloud of 5000 (left), 500 (middle) and 50 (right) points are as below:

We can see that when decreate the n_points , the test F1 score will be lower and the reconstruction will be sparser.

I also set w_chamfer as 5, 1 and 0.1 and the average test F1 score at 0.05 threshold for mesh are 85.369, 95.746 and 85.348, respectively.

The input RGB (left), and a render of the ground truth mesh (right) are as below:

The render of the predicted mesh with w_chamfer as 5 (left), 1 (middle) and 0.1 (right) are as below:

We can see the w_chamfer can affect the performance of network and set w_chamfer as 1 has the best performance.

2.6. Interpret your model (15 points)

To see if the network learned to understand what the 'chair' really means, I crop center part of 40 pixel by 40 pixel in the input image and do comparison with original restriction. (The input images are the same as 2.1-2.3)

For voxel network, the results are as below:

For point cloud network, the results are as below:

For mesh network, the results are as below:

We can see the model can work well (only a little worse than before) even when we crop the center of the image. We can see from this experiment that the model did learn what a 'chair' is like.

3. (Extra Credit) Exploring some recent architectures.

3.1 Implicit network (10 points)

I use the same network structure as Occupancy Networks. The result is as below.

 

3.2 Parametric network (10 points)

I use the same network structure as AtlasNet. The result is as below and the average test F1 score at 0.05 threshold for point cloud is 96.507.